Graduate Texts in Mathematics 1 3 5 


Editorial Board 
S. Axler 
K.A. Ribet 


OowmMna 


Graduate Texts in Mathematics 


TAKEUTI/ZaRING. Introduction to Axiomatic 
Set Theory. 2nd ed. 

Oxtosy. Measure and Category. 2nd ed. 
SCHAEFER. Topological Vector Spaces. 

2nd ed. 

Hitton/Stammpacu. A Course in 
Homological Algebra. 2nd ed. 

Mac Lane. Categories for the Working 
Mathematician. 2nd ed. 

Hucues/Pirer. Projective Planes. 

J.-P. Serre. A Course in Arithmetic. 
TAKEUTI/ZaRING. Axiomatic Set Theory. 
Humpureys. Introduction to Lie Algebras and 
Representation Theory. 

Conen. A Course in Simple Homotopy 
Theory. 

Conway. Functions of One Complex 
Variable I. 2nd ed. 

Beats. Advanced Mathematical Analysis. 
ANDERSON/FULLER. Rings and Categories of 
Modules. 2nd ed. 

Go usiTskY/GUILLEMIN. Stable Mappings and 
Their Singularities. 

BERBERIAN. Lectures in Functional Analysis 
and Operator Theory. 

Winter. The Structure of Fields. 
ROSENBLATT. Random Processes. 2nd ed. 
Havmos. Measure Theory. 

Havmos. A Hilbert Space Problem Book. 
2nd ed. 

Husemo __er. Fibre Bundles. 3rd ed. 
Humpureys. Linear Algebraic Groups. 
Barnes/Macx. An Algebraic Introduction to 
Mathematical Logic. 

Grevs. Linear Algebra. 4th ed. 

Hormes. Geometric Functional Analysis and 
Its Applications. 

HEwitt/STROMBERG. Real and Abstract 
Analysis. 

Manes. Algebraic Theories. 

KELLEY. General Topology. 
ZARISKI/SAMUEL. Commutative Algebra. 
Vol. I. 

ZARISKI/SAMUEL. Commutative Algebra. 
Vol. TI. 

Jacosson. Lectures in Abstract Algebra I. 
Basic Concepts. 

Jacosson. Lectures in Abstract Algebra II. 
Linear Algebra. 

Jacosson. Lectures in Abstract Algebra III. 
Theory of Fields and Galois Theory. 
Hirscu. Differential Topology. 

Spitzer. Principles of Random Walk. 2nd ed. 
ALEXANDER/WERMER. Several Complex 
Variables and Banach Algebras. 3rd ed. 
KELLEY/Namioxa et al. Linear Topological 
Spaces. 

Monk. Mathematical Logic. 


38 


GrAUERT/FRITZSCHE. Several Complex 
Variables. 

Arveson. An Invitation to C*-Algebras. 
KEMENY/SNELL/Knapp. Denumerable Markov 
Chains. 2nd ed. 

Apostot. Modular Functions and Dirichlet 
Series in Number Theory. 2nd ed. 

J.-P. Serre. Linear Representations of Finite 
Groups. 

GILLMAN/JERISON. Rings of Continuous 
Functions. 

Kenpic. Elementary Algebraic Geometry. 
Loève. Probability Theory I. 4th ed. 

Loève. Probability Theory II. 4th ed. 
Moise. Geometric Topology in Dimensions 2 
and 3. 

Sacus/Wu. General Relativity for 
Mathematicians. 

GrUENBERG/WEIR. Linear Geometry. 2nd ed. 
Epwarpbs. Fermat’s Last Theorem. 
KLINGENBERG. A Course in Differential 
Geometry. 

HARTSHORNE. Algebraic Geometry. 

Manin. A Course in Mathematical Logic. 
Graver/WarkINs. Combinatorics with 
Emphasis on the Theory of Graphs. 
Brown/Pearcy. Introduction to Operator 
Theory I: Elements of Functional Analysis. 
Massey. Algebraic Topology: An 
Introduction. 

CrowELL/Fox. Introduction to Knot Theory. 
Kos itz. p-adic Numbers, p-adic Analysis, 
and Zeta-Functions. 2nd ed. 

Lana. Cyclotomic Fields. 

ARNOLD. Mathematical Methods in Classical 
Mechanics. 2nd ed. 

WHITEHEAD. Elements of Homotopy Theory. 
KARGAPOLOV/MERIZJAKOV. Fundamentals of 
the Theory of Groups. 

Botvosas. Graph Theory. 

Epwarpbs. Fourier Series. Vol. I. 2nd ed. 
WELLS. Differential Analysis on Complex 
Manifolds. 3rd ed. 

WATERHOUSE. Introduction to Affine Group 
Schemes. 

SERRE. Local Fields. 

WEIDMANN. Linear Operators in Hilbert 
Spaces. 

Lana. Cyclotomic Fields II. 

Massey. Singular Homology Theory. 
Farkas/Kra. Riemann Surfaces. 2nd ed. 
STILLWELL. Classical Topology and 
Combinatorial Group Theory. 2nd ed. 
HUNGERFORD. Algebra. 

Davenport. Multiplicative Number Theory. 
3rd ed. 

HocuscuiLp. Basic Theory of Algebraic 
Groups and Lie Algebras. 


(continued after index) 


Steven Roman 


Advanced Linear Algebra 
Third Edition 


A Springer 


Steven Roman 

8 Night Star 

Irvine, CA 92603 

USA 
sroman@romanpress.com 


Editorial Board 


S. Axler 

Mathematics Department 

San Francisco State University 
San Francisco, CA 94132 
USA 

axler@sfsu.edu 


ISBN-13: 978-0-387-72828-5 


Library of Congress Control Number: 2007934001 
Mathematics Subject Classification (2000): 15-01 


© 2008 Springer Science+Business Media, LLC 

All rights reserved. This work may not be translated or copied in whole or in part without the written 
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, 
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in 
connection with any form of information storage and retrieval, electronic adaptation, computer software, 


K.A. Ribet 

Mathematics Department 
University of California at Berkeley 
Berkeley, CA 94720-3840 

USA 

ribet@math.berkeley.edu 


e-ISBN-13: 978-0-387-72831-5 


or by similar or dissimilar methodology now known or hereafter developed is forbidden. 


The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are 
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject 


to proprietary rights. 
Printed on acid-free paper. 
987654321 


springer.com 


To Donna 
and to 
Rashelle, Carol and Dan 


Preface to the Third Edition 


Let me begin by thanking the readers of the second edition for their many 
helpful comments and suggestions, with special thanks to Joe Kidd and Nam 
Trang. For the third edition, I have corrected all known errors, polished and 
refined some arguments (such as the discussion of reflexivity, the rational 
canonical form, best approximations and the definitions of tensor products) and 
upgraded some proofs that were originally done only for finite-dimensional/rank 
cases. I have also moved some of the material on projection operators to an 
earlier position in the text. 


A few new theorems have been added in this edition, including the spectral 
mapping theorem and a theorem to the effect that dim(V) < dim(V*), with 
equality if and only if V is finite-dimensional. 


I have also added a new chapter on associative algebras that includes the well- 
known characterizations of the finite-dimensional division algebras over the real 
field (a theorem of Frobenius) and over a finite field (Wedderburn's theorem). 
The reference section has been enlarged considerably, with over a hundred 
references to books on linear algebra. 


Steven Roman Irvine, California, May 2007 


Preface to the Second Edition 


Let me begin by thanking the readers of the first edition for their many helpful 
comments and suggestions. The second edition represents a major change from 
the first edition. Indeed, one might say that it is a totally new book, with the 
exception of the general range of topics covered. 


The text has been completely rewritten. I hope that an additional 12 years and 
roughly 20 books worth of experience has enabled me to improve the quality of 
my exposition. Also, the exercise sets have been completely rewritten. 


The second edition contains two new chapters: a chapter on convexity, 
separation and positive solutions to linear systems (Chapter 15) and a chapter on 
the QR decomposition, singular values and pseudoinverses (Chapter 17). The 
treatments of tensor products and the umbral calculus have been greatly 
expanded and I have included discussions of determinants (in the chapter on 
tensor products), the complexification of a real vector space, Schur's theorem 
and GerSgorin disks. 


Steven Roman Irvine, California February 2005 


Preface to the First Edition 


This book is a thorough introduction to linear algebra, for the graduate or 
advanced undergraduate student. Prerequisites are limited to a knowledge of the 
basic properties of matrices and determinants. However, since we cover the 
basics of vector spaces and linear transformations rather rapidly, a prior course 
in linear algebra (even at the sophomore level), along with a certain measure of 
“mathematical maturity,” is highly desirable. 


Chapter 0 contains a summary of certain topics in modern algebra that are 
required for the sequel. This chapter should be skimmed quickly and then used 
primarily as a reference. Chapters 1-3 contain a discussion of the basic 
properties of vector spaces and linear transformations. 


Chapter 4 is devoted to a discussion of modules, emphasizing a comparison 
between the properties of modules and those of vector spaces. Chapter 5 
provides more on modules. The main goals of this chapter are to prove that any 
two bases of a free module have the same cardinality and to introduce 
Noetherian modules. However, the instructor may simply skim over this 
chapter, omitting all proofs. Chapter 6 is devoted to the theory of modules over 
a principal ideal domain, establishing the cyclic decomposition theorem for 
finitely generated modules. This theorem is the key to the structure theorems for 
finite-dimensional linear operators, discussed in Chapters 7 and 8. 


Chapter 9 is devoted to real and complex inner product spaces. The emphasis 
here is on the finite-dimensional case, in order to arrive as quickly as possible at 
the finite-dimensional spectral theorem for normal operators, in Chapter 10. 
However, we have endeavored to state as many results as is convenient for 
vector spaces of arbitrary dimension. 


The second part of the book consists of a collection of independent topics, with 
the one exception that Chapter 13 requires Chapter 12. Chapter 11 is on metric 
vector spaces, where we describe the structure of symplectic and orthogonal 
geometries over various base fields. Chapter 12 contains enough material on 
metric spaces to allow a unified treatment of topological issues for the basic 


xii Preface 


Hilbert space theory of Chapter 13. The rather lengthy proof that every metric 
space can be embedded in its completion may be omitted. 


Chapter 14 contains a brief introduction to tensor products. In order to motivate 
the universal property of tensor products, without getting too involved in 
categorical terminology, we first treat both free vector spaces and the familiar 
direct sum, in a universal way. Chapter 15 (Chapter 16 in the second edition) is 
on affine geometry, emphasizing algebraic, rather than geometric, concepts. 


The final chapter provides an introduction to a relatively new subject, called the 
umbral calculus. This is an algebraic theory used to study certain types of 
polynomial functions that play an important role in applied mathematics. We 
give only a brief introduction to the subject — emphasizing the algebraic 
aspects, rather than the applications. This is the first time that this subject has 
appeared in a true textbook. 


One final comment. Unless otherwise mentioned, omission of a proof in the text 
is a tacit suggestion that the reader attempt to supply one. 


Steven Roman Irvine, California 


Contents 


Preface to the Third Edition, vii 
Preface to the Second Edition, ix 
Preface to the First Edition, xi 


Preliminaries, 1 
Part 1: Preliminaries, 1 
Part 2: Algebraic Structures, 17 


Part I—Basic Linear Algebra, 33 


1 Vector Spaces, 35 
Vector Spaces, 35 
Subspaces, 37 
Direct Sums, 40 
Spanning Sets and Linear Independence, 44 
The Dimension of a Vector Space, 48 
Ordered Bases and Coordinate Matrices, 51 
The Row and Column Spaces of a Matrix, 52 
The Complexification of a Real Vector Space, 53 
Exercises, 55 


2 Linear Transformations, 59 
Linear Transformations, 59 
The Kernel and Image of a Linear Transformation, 61 
Isomorphisms, 62 
The Rank Plus Nullity Theorem, 63 
Linear Transformations from F” to F™, 64 
Change of Basis Matrices, 65 
The Matrix of a Linear Transformation, 66 
Change of Bases for Linear Transformations, 68 
Equivalence of Matrices, 68 
Similarity of Matrices, 70 
Similarity of Operators, 71 
Invariant Subspaces and Reducing Pairs, 72 
Projection Operators, 73 


xiv Contents 


Topological Vector Spaces, 79 
Linear Operators on V“, 82 
Exercises, 83 


3 The Isomorphism Theorems, 87 
Quotient Spaces, 87 
The Universal Property of Quotients and 

the First Isomorphism Theorem, 90 

Quotient Spaces, Complements and Codimension, 92 
Additional Isomorphism Theorems, 93 
Linear Functionals, 94 
Dual Bases, 96 
Reflexivity, 100 
Annihilators, 101 
Operator Adjoints, 104 
Exercises, 106 


4 Modules I: Basic Properties, 109 

Motivation, 109 

Modules, 109 

Submodules, 111 

Spanning Sets, 112 

Linear Independence, 114 

Torsion Elements, 115 

Annihilators, 115 

Free Modules, 116 

Homomorphisms, 117 

Quotient Modules, 117 

The Correspondence and Isomorphism Theorems, 118 

Direct Sums and Direct Summands, 119 
Modules Are Not as Nice as Vector Spaces, 124 
Exercises, 125 


5 Modules II: Free and Noetherian Modules, 127 
The Rank of a Free Module, 127 

Free Modules and Epimorphisms, 132 

Noetherian Modules, 132 

The Hilbert Basis Theorem, 136 

Exercises, 137 


6 Modules over a Principal Ideal Domain, 139 
Annihilators and Orders, 139 
Cyclic Modules, 140 
Free Modules over a Principal Ideal Domain, 142 
Torsion-Free and Free Modules, 145 
The Primary Cyclic Decomposition Theorem, 146 
The Invariant Factor Decomposition, 156 
Characterizing Cyclic Modules, 158 


10 


11 


Contents 


Indecomposable Modules, 158 
Exercises, 159 


The Structure of a Linear Operator, 163 

The Module Associated with a Linear Operator, 164 
The Primary Cyclic Decomposition of V,, 167 

The Characteristic Polynomial, 170 

Cyclic and Indecomposable Modules, 171 

The Big Picture, 174 

The Rational Canonical Form, 176 

Exercises, 182 


Eigenvalues and Eigenvectors, 185 
Eigenvalues and Eigenvectors, 185 
Geometric and Algebraic Multiplicities, 189 
The Jordan Canonical Form, 190 
Triangularizability and Schur's Theorem, 192 
Diagonalizable Operators, 196 

Exercises, 198 


Real and Complex Inner Product Spaces, 205 
Norm and Distance, 208 

Isometries, 210 

Orthogonality, 211 

Orthogonal and Orthonormal Sets, 212 

The Projection Theorem and Best Approximations, 219 
The Riesz Representation Theorem, 221 

Exercises, 223 


Structure Theory for Normal Operators, 227 
The Adjoint of a Linear Operator, 227 
Orthogonal Projections, 231 

Unitary Diagonalizability, 233 

Normal Operators, 234 

Special Types of Normal Operators, 238 
Self-Adjoint Operators, 239 

Unitary Operators and Isometries, 240 

The Structure of Normal Operators, 245 
Functional Calculus, 247 

Positive Operators, 250 

The Polar Decomposition of an Operator, 252 
Exercises, 254 


Part II—Topics, 257 


Metric Vector Spaces: The Theory of Bilinear Forms, 259 
Symmetric, Skew-Symmetric and Alternate Forms, 259 
The Matrix of a Bilinear Form, 261 


XV 


xvi 


12 


13 


14 


Contents 


Quadratic Forms, 264 
Orthogonality, 265 
Linear Functionals, 268 
Orthogonal Complements and Orthogonal Direct Sums, 269 
Isometries, 271 
Hyperbolic Spaces, 272 
Nonsingular Completions of a Subspace, 273 
The Witt Theorems: A Preview, 275 
The Classification Problem for Metric Vector Spaces, 276 
Symplectic Geometry, 277 
The Structure of Orthogonal Geometries: Orthogonal Bases, 282 
The Classification of Orthogonal Geometries: 
Canonical Forms, 285 
The Orthogonal Group, 291 
The Witt Theorems for Orthogonal Geometries, 294 
Maximal Hyperbolic Subspaces of an Orthogonal Geometry, 295 
Exercises, 297 


Metric Spaces, 301 

The Definition, 301 

Open and Closed Sets, 304 
Convergence in a Metric Space, 305 
The Closure of a Set, 306 

Dense Subsets, 308 

Continuity, 310 

Completeness, 311 

Isometries, 315 

The Completion of a Metric Space, 316 
Exercises, 321 


Hilbert Spaces, 325 

A Brief Review, 325 

Hilbert Spaces, 326 

Infinite Series, 330 

An Approximation Problem, 331 

Hilbert Bases, 335 

Fourier Expansions, 336 

A Characterization of Hilbert Bases, 346 
Hilbert Dimension, 346 

A Characterization of Hilbert Spaces, 347 
The Riesz Representation Theorem, 349 
Exercises, 352 

Tensor Products, 355 

Universality, 355 

Bilinear Maps, 359 

Tensor Products, 361 


15 


16 


17 


18 


19 


Contents 


When Is a Tensor Product Zero?, 367 
Coordinate Matrices and Rank, 368 
Characterizing Vectors in a Tensor Product, 371 
Defining Linear Transformations on a Tensor Product, 374 
The Tensor Product of Linear Transformations, 375 
Change of Base Field, 379 
Multilinear Maps and Iterated Tensor Products, 382 
Tensor Spaces, 385 
Special Multilinear Maps, 390 
Graded Algebras, 392 
The Symmetric and Antisymmetric 

Tensor Algebras, 392 
The Determinant, 403 
Exercises, 406 


Positive Solutions to Linear Systems: 
Convexity and Separation, 411 

Convex, Closed and Compact Sets, 413 

Convex Hulls, 414 

Linear and Affine Hyperplanes, 416 

Separation, 418 

Exercises, 423 


Affine Geometry, 427 
Affine Geometry, 427 
Affine Combinations, 428 
Affine Hulls, 430 

The Lattice of Flats, 431 
Affine Independence, 433 
Affine Transformations, 435 
Projective Geometry, 437 
Exercises, 440 


Singular Values and the Moore—Penrose Inverse, 443 
Singular Values, 443 

The Moore-Penrose Generalized Inverse, 446 

Least Squares Approximation, 448 

Exercises, 449 


An Introduction to Algebras, 451 
Motivation, 451 

Associative Algebras, 451 

Division Algebras, 462 

Exercises, 469 


The Umbral Calculus, 471 
Formal Power Series, 471 
The Umbral Algebra, 473 


xvii 


xviii Contents 


Formal Power Series as Linear Operators, 477 
Sheffer Sequences, 480 
Examples of Sheffer Sequences, 488 
Umbral Operators and Umbral Shifts, 490 
Continuous Operators on the Umbral Algebra, 492 
Operator Adjoints, 493 
Umbral Operators and Automorphisms 

of the Umbral Algebra, 494 
Umbral Shifts and Derivations of the Umbral Algebra, 499 
The Transfer Formulas, 504 
A Final Remark, 505 
Exercises, 506 


References, 507 
Index of Symbols, 513 
Index, 515 


Preliminaries 


In this chapter, we briefly discuss some topics that are needed for the sequel. 
This chapter should be skimmed quickly and used primarily as a reference. 


Part 1 Preliminaries 


Multisets 


The following simple concept is much more useful than its infrequent 
appearance would indicate. 


Definition Let S be a nonempty set. A multiset M with underlying set S is a 
set of ordered pairs 


M = {(s:,n:) | si € S, ni € Z*, si Z 8; fori F j} 


where Z* = {1,2,...}. The number n; is referred to as the multiplicity of the 
elements s; in M. If the underlying set of a multiset is finite, we say that the 
multiset is finite. The size of a finite multiset M is the sum of the multiplicities 
of all of its elements.O 


S = {a,b,c}. The element a has multiplicity 2. One often writes out the 
elements of a multiset according to multiplicities, as in M = {a, a, b,b, b,c}. 


For example, M = {(a,2),(b,3),(c,1)} is a multiset with underlying set 


Of course, two mutlisets are equal if their underlying sets are equal and if the 
multiplicity of each element in the common underlying set is the same in both 
multisets. 


Matrices 


The set of m x n matrices with entries in a field F is denoted by Mm, n(F) or 
by Mm,n when the field does not require mention. The set Mn, n( F) is denoted 
by M, (F) or Mn. If A E€ M, the (i, 7)th entry of A will be denoted by A;,;. 
The identity matrix of size n x n is denoted by I,,. The elements of the base 


2 Advanced Linear Algebra 


field F are called scalars. We expect that the reader is familiar with the basic 
properties of matrices, including matrix addition and multiplication. 


The main diagonal of an m x n matrix A is the sequence of entries 


Ain, Ago, eae Akk 
where k = min{m, n}. 


Definition The transpose of A € Mm,n is the matrix A‘ defined by 
(Aig = Aji 
A matrix A is symmetric if A = A‘ and skew-symmetric if A‘ = — A.O 


Theorem 0.1 (Properties of the transpose) Let A, B E€ My. Then 
1) (AVL=A 

2) (A+B = A+ B' 

3) (rA =rA' forallr € F 

4) (AB)! = B'A provided that the product AB is defined 

5) det(A‘) = det(A).0 


Partitioning and Matrix Multiplication 


Let M be a matrix of size m x n. If B C {1,...,m} and C C {1,...,n}, then 
the submatrix M[B,C] is the matrix obtained from M by keeping only the 
rows with index in B and the columns with index in C. Thus, all other rows and 
columns are discarded and M [B, C] has size |B| x |C]. 


Suppose that M E€ Mm,n and N € M,,,. Let 


1) P = {Bı,..., Bp} bea partition of {1,...,m} 
2) Q = {C1,..., Cq} be a partition of {1,...,n} 
3) R={D,,...,D,} be a partition of {1,...,k} 


(Partitions are defined formally later in this chapter.) Then it is a very useful fact 
that matrix multiplication can be performed at the block level as well as at the 
entry level. In particular, we have 


[MN][Bi,Dj] = X M[B:, Ch] N [Ch Dy] 
CEQ 


When the partitions in question contain only single-element blocks, this is 
precisely the usual formula for matrix multiplication 


[MN]; = X Min Nh,j 
h=1 


Preliminaries 3 


Block Matrices 


It will be convenient to introduce the notational device of a block matrix. If B; j 
are matrices of the appropriate sizes, then by the block matrix 


Bıı Biya © Bin 
M= : : : 


Bn Bn “te Brin block 


we mean the matrix whose upper left submatrix is B11, and so on. Thus, the 
B; j's are submatrices of M and not entries. A square matrix of the form 


pa[O nt 
Mayr onl nl 0 
0o 0 Ba 


block 


where each B; is square and 0 is a zero submatrix, is said to be a block 
diagonal matrix. 


Elementary Row Operations 


Recall that there are three types of elementary row operations. Type 1 
operations consist of multiplying a row of A by a nonzero scalar. Type 2 
operations consist of interchanging two rows of A. Type 3 operations consist of 
adding a scalar multiple of one row of A to another row of A. 


If we perform an elementary operation of type k to an identity matrix J,,, the 
result is called an elementary matrix of type k. It is easy to see that all 
elementary matrices are invertible. 


In order to perform an elementary row operation on A E€ Mm,n we can perform 
that operation on the identity Im, to obtain an elementary matrix E and then take 
the product EA. Note that multiplying on the right by E has the effect of 
performing column operations. 


Definition A matrix R is said to be in reduced row echelon form if 

1) All rows consisting only of 0's appear at the bottom of the matrix. 

2) In any nonzero row, the first nonzero entry is a 1. This entry is called a 
leading entry. 

3) For any two consecutive rows, the leading entry of the lower row is to the 
right of the leading entry of the upper row. 

4) Any column that contains a leading entry has 0's in all other positions. O 


Here are the basic facts concerning reduced row echelon form. 


4 Advanced Linear Algebra 


Theorem 0.2 Matrices A,B E€ My» are row equivalent, denoted by A ~ B, 
if either one can be obtained from the other by a series of elementary row 
operations. 
1) Row equivalence is an equivalence relation. That is, 
a) AnA 
b) An~ BSBna 
c) Ax BBrCSAYC. 
2) A matrix A is row equivalent to one and only one matrix R that is in 
reduced row echelon form. The matrix R is called the reduced row 
echelon form of A. Furthermore, 


R= E -EA 


where E; are the elementary matrices required to reduce A to reduced row 
echelon form. 

3) A is invertible if and only if its reduced row echelon form is an identity 
matrix. Hence, a matrix is invertible if and only if it is the product of 
elementary matrices. O 


The following definition is probably well known to the reader. 


Definition A square matrix is upper triangular if all of its entries below the 
main diagonal are 0. Similarly, a square matrix is lower triangular if all of its 
entries above the main diagonal are 0. A square matrix is diagonal if all of its 
entries off the main diagonal are 0.0 


Determinants 


We assume that the reader is familiar with the following basic properties of 
determinants. 


Theorem 0.3 Let A E€ Mn n(F). Then det(A) is an element of F. Furthermore, 
1) Forany B E€ M,(F), 


det(AB) = det(A)det(B) 
2) Ais nonsingular (invertible) if and only if det(A) # 0. 
3) The determinant of an upper triangular or lower triangular matrix is the 


product of the entries on its main diagonal. 
4) Ifasquare matrix M has the block diagonal form 


Bi O =- 0 
O h h i 
M . = Ò 
0 0 Ba 


block 


then det( M) = [| det(B;).0 


Preliminaries 5 


Polynomials 


The set of all polynomials in the variable x with coefficients from a field F is 
denoted by F'[x]. If p(x) € F'[a], we say that p(x) is a polynomial over F’. If 


p(x) = ag + aya +++ + ay” 


is a polynomial with an 4 0, then an is called the leading coefficient of p(x) 
and the degree of p(x) is n, written deg p(x) = n. For convenience, the degree 
of the zero polynomial is —oo. A polynomial is monic if its leading coefficient 
is 1. 


Theorem 0.4 (Division algorithm) Let f(x), g(x) € F' [2] where deg g(a) > 0. 
Then there exist unique polynomials q(x), r(x) € F[x] for which 


f(x) = a(x) g(a) + r(x) 
where r(x) = O or 0 < deg r(x) < deg g(x).O 


If p(x) divides q(x), that is, if there exists a polynomial f(x) for which 
q(x) = f(x)p(@) 


then we write p(x) | q(x). A nonzero polynomial p(x) € F'[a] is said to split 
over F if p(a) can be written as a product of linear factors 


p(z) = (x = rı) (2 = rn) 


where r; € F. 

Theorem 0.5 Let f(x), g(x) € Fa]. The greatest common divisor of f (x) and 
g(x), denoted by ged( f(x), g(a)), is the unique monic polynomial p(x) over F 
for which 


1) p(x) | f(a) and p(x) | g(a) 
2) ifr(x) | f(x) and r(x) | g(x) then r(x) | p(x). 


Furthermore, there exist polynomials a(x) and b(x) over F for which 
ged( f(x), g(x)) = a(x) f(x) + b(a)g(@) O 


Definition The polynomials f(x),g(x)€ F|x] are relatively prime if 
gcd( f(x), g(£)) = 1. In particular, f(x) and g(x) are relatively prime if and 
only if there exist polynomials a(x) and b(x) over which 
( 


a(a) f(x) + b(x)g(x) = O 


Definition 4 nonconstant polynomial f(x) € F|x] is irreducible if whenever 
f(x) = p(x)q(x), then one of p(x) and q(x) must be constant. O 


The following two theorems support the view that irreducible polynomials 
behave like prime numbers. 


6 Advanced Linear Algebra 


Theorem 0.6 4 nonconstant polynomial f(x) is irreducible if and only if it has 
the property that whenever f(x) | p(x)q(x), then either f(a) | p(x) or 
f(x) | q(@).0 


Theorem 0.7 Every nonconstant polynomial in F|x] can be written as a product 
of irreducible polynomials. Moreover, this expression is unique up to order of 
the factors and multiplication by a scalar. O 


Functions 

To set our notation, we should make a few comments about functions. 
Definition Let f: S — T be a function froma set S toa set T. 

1) The domain of f is the set S and the range of f is T. 

2) The image of f is the set im(f) = {f(s) | s € S}. 

3) f is injective (one-to-one), or an injection, if x 4 y > f(x) 4 f(y). 
4) f is surjective (onto T), or a surjection, ifim( f) = T. 


5) fis bijective, or a bijection, if it is both injective and surjective. 
6) Assuming that 0 € T, the support of f is 


supp(f) = {s € S| f(s) # 0} O 


If f:S — T is injective, then its inverse f~':im(f) — S exists and is well- 
defined as a function on im( f). 


It will be convenient to apply f to subsets of S and T. In particular, if X C S 
andif Y C T, we set 


F(X) = {f(x) | 2 € X} 
and 
f°) ={seS| f(s) €Y} 


Note that the latter is defined even if f is not injective. 


Let f:S — T. If A CS, the restriction of f to A is the function f|4: A > T 
defined by 


fla(a) = f(a) 


for all a € A. Clearly, the restriction of an injective map is injective. 


In the other | direction, if f: S =a and if S C U, then an extension of f to U is 
a function f:U — T for which f|s = f. 


Preliminaries 7 


Equivalence Relations 


The concept of an equivalence relation plays a major role in the study of 
matrices and linear transformations. 


Definition Let S be a nonempty set. A binary relation ~ on S is called an 
equivalence relation on S if it satisfies the following conditions: 
1) (Reflexivity) 


foralla € S. 
2) (Symmetry) 
axb>b~a 


foralla,b € S. 
3) (Transitivity) 


a~bb~csanc 


foralla,b,c € S.O 


Definition Let ~ be an equivalence relation on S. For a € S, the set of all 
elements equivalent to a is denoted by 


fa] = {b E€ S |b ~a} 


and called the equivalence class of a. O 


Theorem 0.8 Let ~ be an equivalence relation on S. Then 
1) be [a] sae jb] e [a] = [b] 
2) For any a,b € S, we have either |a] = [b] or [a] A [b] = 0.0 


Definition 4 partition of a nonempty set S is a collection {Aj,..., An} of 
nonempty subsets of S, called the blocks of the partition, for which 

D A;N A; =O for alli Fj 

2) S=A,U-:-UA,.0 


The following theorem sheds considerable light on the concept of an 
equivalence relation. 


Theorem 0.9 

1) Let ~ be an equivalence relation on S. Then the set of distinct equivalence 
classes with respect to ~ are the blocks of a partition of S. 

2) Conversely, if P is a partition of S, the binary relation ~ defined by 


a ~b ifa and b lie in the same block of P 


8 Advanced Linear Algebra 


is an equivalence relation on S, whose equivalence classes are the blocks 
of P. 
This establishes a one-to-one correspondence between equivalence relations on 
S and partitions of S.O 


The most important problem related to equivalence relations is that of finding an 
efficient way to determine when two elements are equivalent. Unfortunately, in 
most cases, the definition does not provide an efficient test for equivalence and 
so we are led to the following concepts. 


Definition Let ~ be an equivalence relation on S. A function f: S — T, where 
T is any set, is called an invariant of ~ if it is constant on the equivalence 
classes of ~ , that is, 


a~b= f(a) = f(b) 


and a complete invariant if it is constant and distinct on the equivalence 
classes of ~ , that is, 


a~b © f(a) = f(b) 


A collection {fi,..., fn} of invariants is called a complete system of 
invariants if 


axb e fila) = filb) for alli =1,... ,n Oo 


Definition Let ~ be an equivalence relation on S. A subset C C S is said to be 
a set of canonical forms (or just a canonical form) for ~ if for every s € S, 
there is exactly one c € C such that c ~ s. Put another way, each equivalence 
class under ~ contains exactly one member of C.O 


Example 0.1 Define a binary relation ~ on F |x] by letting p(x) ~ q(x) if and 
only if p(x) = aq(x) for some nonzero constant a € F. This is easily seen to be 
an equivalence relation. The function that assigns to each polynomial its degree 
is an invariant, since 


p(z) ~ q(x) = deg(p(x)) = deg(q(z)) 


However, it is not a complete invariant, since there are inequivalent polynomials 
with the same degree. The set of all monic polynomials is a set of canonical 
forms for this equivalence relation. O 


Example 0.2 We have remarked that row equivalence is an equivalence relation 
on Mm.n( F). Moreover, the subset of reduced row echelon form matrices is a 
set of canonical forms for row equivalence, since every matrix is row equivalent 
to a unique matrix in reduced row echelon form. O 


Preliminaries 9 


Example 0.3 Two matrices A, B E€ M, (F) are row equivalent if and only if 
there is an invertible matrix P such that A = PB. Similarly, A and B are 
column equivalent, that is, A can be reduced to B using elementary column 
operations, if and only if there exists an invertible matrix Q such that A = BQ. 


Two matrices A and B are said to be equivalent if there exist invertible 
matrices P and Q for which 


A=PBQ 


Put another way, A and B are equivalent if A can be reduced to B by 
performing a series of elementary row and/or column operations. (The use of the 
term equivalent is unfortunate, since it applies to all equivalence relations, not 
just this one. However, the terminology is standard, so we use it here.) 


It is not hard to see that an m x n matrix R that is in both reduced row echelon 
form and reduced column echelon form must have the block form 


Ik Ok,n—k 


Jp = 
Om—k,k Om—kn—k block 


We leave it to the reader to show that every matrix A in Mhp is equivalent to 
exactly one matrix of the form J, and so the set of these matrices is a set of 
canonical forms for equivalence. Moreover, the function f defined by 
f(A) =k, where A ~ Jy, is a complete invariant for equivalence. 


Since the rank of Jų is k and since neither row nor column operations affect the 
rank, we deduce that the rank of A is k. Hence, rank is a complete invariant for 
equivalence. In other words, two matrices are equivalent if and only if they have 
the same rank.O 


Example 0.4 Two matrices A, B E€ M, (F) are said to be similar if there exists 
an invertible matrix P such that 


AS PBP! 


Similarity is easily seen to be an equivalence relation on M,,. As we will learn, 
two matrices are similar if and only if they represent the same linear operators 
on a given n-dimensional vector space V. Hence, similarity is extremely 
important for studying the structure of linear operators. One of the main goals of 
this book is to develop canonical forms for similarity. 


We leave it to the reader to show that the determinant function and the trace 
function are invariants for similarity. However, these two invariants do not, in 
general, form a complete system of invariants. O 


Example 0.5 Two matrices A, B E€ M,,(F’) are said to be congruent if there 
exists an invertible matrix P for which 


10 Advanced Linear Algebra 


A= PBP’ 


where P* is the transpose of P. This relation is easily seen to be an equivalence 
relation and we will devote some effort to finding canonical forms for 
congruence. For some base fields F (such as R, C or a finite field), this is 
relatively easy to do, but for other base fields (such as Q), it is extremely 
difficult. 


Zorn's Lemma 
In order to show that any vector space has a basis, we require a result known as 


Zorn's lemma. To state this lemma, we need some preliminary definitions. 


Definition A partially ordered set is a pair (P, < ) where P is a nonempty set 
and < is a binary relation called a partial order, read “less than or equal to,” 
with the following properties: 

1) (Reflexivity) For alla € P, 


a<a 
2) (Antisymmetry) For all a,b € P, 
a < band b < a implies a = b 


3) (Transitivity) For all a,b,c € P, 
a < bandb < c implies a < c 


Partially ordered sets are also called posets. O 


It is customary to use a phrase such as “Let P be a partially ordered set” when 
the partial order is understood. Here are some key terms related to partially 
ordered sets. 


Definition Let P be a partially ordered set. 

1) The maximum (largest, top) element of P, should it exist, is an element 
M E P with the property that all elements of P are less than or equal to 
M, that is, 


pEP>p<M 
Similarly, the mimimum (least, smallest, bottom) element of P, should it 


exist, is an element N € P with the property that all elements of P are 
greater than or equal to N, that is, 


pEP=N<p 
2) A maximal element is an element m € P with the property that there is no 
larger element in P, that is, 


pEP,m<p>m=p 


Preliminaries 11 


Similarly, a minimal element is an element n € P with the property that 
there is no smaller element in P, that is, 


pEP,p<n>p=n 
3) Leta,b € P. Then u € P is an upper bound for a and b if 
a<uandb<u 


The unique smallest upper bound for a and b, if it exists, is called the least 
upper bound of a and b and is denoted by lub{a, b}. 
4) Leta,b € P. Then? € P is a lower bound for a and b if 


L<aandl <b 


The unique largest lower bound for a and b, if it exists, is called the 
greatest lower bound of a and b and is denoted by glb{a, b}. 0O 


Let S be a subset of a partially ordered set P. We say that an element u € P is 
an upper bound for S if s <u for all s € S. Lower bounds are defined 
similarly. 


Note that in a partially ordered set, it is possible that not all elements are 
comparable. In other words, it is possible to have x,y € P with the property 
that x { yand y £ z. 


Definition A partially ordered set in which every pair of elements is 
comparable is called a totally ordered set, or a linearly ordered set. Any 
totally ordered subset of a partially ordered set P is called a chain in P.O 


Example 0.6 

1) The set R of real numbers, with the usual binary relation < , is a partially 
ordered set. It is also a totally ordered set. It has no maximal elements. 

2) The set N = {0,1,...} of natural numbers, together with the binary 
relation of divides, is a partially ordered set. It is customary to write n | m 
to indicate that n divides m. The subset S of N consisting of all powers of 2 
is a totally ordered subset of N, that is, it is a chain in N. The set 
P = {2,4,8,3,9,27} is a partially ordered set under | . It has two maximal 
elements, namely 8 and 27. The subset Q = {2,3,5,7,11} is a partially 
ordered set in which every element is both maximal and minimal! 

3) Let S be any set and let P(S) be the power set of S, that is, the set of all 
subsets of S. Then P(S), together with the subset relation C , is a partially 
ordered set. O 


Now we can state Zorn's lemma, which gives a condition under which a 
partially ordered set has a maximal element. 


12 Advanced Linear Algebra 


Theorem 0.10 (Zorn's lemma) /f P is a partially ordered set in which every 
chain has an upper bound, then P has a maximal element. O 


We will use Zorn's lemma to prove that every vector space has a basis. Zorn's 
lemma is equivalent to the famous axiom of choice. As such, it is not subject to 
proof from the other axioms of ordinary (ZF) set theory. Zorn's lemma has many 
important equivalancies, one of which is the well-ordering principle. A well 
ordering on a nonempty set X is a total order on X with the property that every 
nonempty subset of X has a least element. 


Theorem 0.11 (Well-ordering principle) Every nonempty set has a well 
ordering. O 


Cardinality 
Two sets S and T have the same cardinality, written 
[S| = |T| 


if there is a bijective function (a one-to-one correspondence) between the sets. 
The reader is probably aware of the fact that 


|Z| = |N| and |Q| = |N| 


where N denotes the natural numbers, Z the integers and Q the rational 
numbers. 


If S is in one-to-one correspondence with a subset of T, we write |S| < |T|. If 
S is in one-to-one correspondence with a proper subset of T but not all of T, 
then we write |S| < |T|. The second condition is necessary, since, for instance, 
N is in one-to-one correspondence with a proper subset of Z and yet N is also in 
one-to-one correspondence with Z itself. Hence, |N| = |Z]. 


This is not the place to enter into a detailed discussion of cardinal numbers. The 
intention here is that the cardinality of a set, whatever that is, represents the 
“size” of the set. It is actually easier to talk about two sets having the same, or 
different, size (cardinality) than it is to explicitly define the size (cardinality) of 
a given set. 


Be that as it may, we associate to each set S a cardinal number, denoted by |S| 
or card(S), that is intended to measure the size of the set. Actually, cardinal 
numbers are just very special types of sets. However, we can simply think of 
them as vague amorphous objects that measure the size of sets. 


Definition 
1) A set is finite if it can be put in one-to-one correspondence with a set of the 
form Z,, = {0,1,... ,n — 1}, for some nonnegative integer n. A set that is 


Preliminaries 13 


not finite is infinite. The cardinal number (or cardinality) of a finite set is 
just the number of elements in the set. 

2) The cardinal number of the set N of natural numbers is Xo (read “aleph 
nought”), where X is the first letter of the Hebrew alphabet. Hence, 


IN] = |Z| = |Q| = Xo 


3) Any set with cardinality Xo is called a countably infinite set and any finite 
or countably infinite set is called a countable set. An infinite set that is not 
countable is said to be uncountable. O 


Since it can be shown that |R| > |N], the real numbers are uncountable. 


If S and T are finite sets, then it is well known that 

[S| < |T| and |T| < |S] = |S] = |T] 
The first part of the next theorem tells us that this is also true for infinite sets. 
The reader will no doubt recall that the power set P(S) of a set S is the set of 


all subsets of S. For finite sets, the power set of S is always bigger than the set 
itself. In fact, 


|S] =n = |P(S)| =2" 


The second part of the next theorem says that the power set of any set S' is 
bigger (has larger cardinality) than S' itself. On the other hand, the third part of 
this theorem says that, for infinite sets S, the set of all finite subsets of S' is the 
same size as S. 


Theorem 0.12 
1) (Schréder—Bernstein theorem) For any sets S and T, 


|S] < |T| and |T| < |S| = |S| = |T] 


2) (Cantor's theorem) Zf P(S) denotes the power set of S, then 


[S| < |P(S)| 
3) If Po(S) denotes the set of all finite subsets of S and if S is an infinite set, 
then 
IS] = |Po(S)| 


Proof. We prove only parts 1) and 2). Let f: S — T be an injective function 
from S into T and let g:T — S be an injective function from T into S. We 
want to use these functions to create a bijective function from S to T. For this 
purpose, we make the following definitions. The descendants of an element 
s ES are the elements obtained by repeated alternate applications of the 
functions f and g, namely 


14 Advanced Linear Algebra 


f(s), 9(F(s)), FOC (8))), + 


If t is a descendant of s, then s is an ancestor of t. Descendants and ancestors 
of elements of T are defined similarly. 


Now, by tracing an element's ancestry to its beginning, we find that there are 
three possibilities: the element may originate in S, or in T, or it may have no 
point of origin. Accordingly, we can write S as the union of three disjoint sets 

Ss = {s € S | s originates in S} 

Sr = {s € S | s originates in T} 

Sæ = {s € S | s has no originator} 


Similarly, T is the disjoint union of 7s, Tr and Tg. 


Now, the restriction 
Fls: Ss > Ts 


is a bijection. To see this, note that if t € Zs, then t originated in S and 
therefore must have the form f(s) for some s € S. But t and its ancestor s have 
the same point of origin and so t € Ts implies s € Sg. Thus, f|s, is surjective 
and hence bijective. We leave it to the reader to show that the functions 


(gla): Sr — Tr and fs: Sœ > Too 


are also bijections. Putting these three bijections together gives a bijection 
between S and T. Hence, |S| = |T], as desired. 


We now prove Cantor's theorem. The map v: S — P(S) defined by (s) = {s} 
is an injection from S to P(S) and so |S| < |P(S)|. To complete the proof we 
must show that no injective map f: S — P(S) can be surjective. To this end, let 


X={sEeS|s¢ f(s)} € P(S) 


We claim that X is not in im( f). For suppose that X = f(x) for some x € S. 
Then if x € X, we have by the definition of X that x ¢ X. On the other hand, if 
x ¢ X, we have again by the definition of X that x € X. This contradiction 
implies that X ¢ im( f) and so f is not surjective. 


Cardinal Arithmetic 


Now let us define addition, multiplication and exponentiation of cardinal 
numbers. If S and T are sets, the cartesian product S x T is the set of all 
ordered pairs 


SxT ={(s,t)|seS,teT} 


The set of all functions from T to S is denoted by ST. 


Preliminaries 15 


Definition Let k and À denote cardinal numbers. Let S and T be disjoint sets 
for which |S'| = « and |T| = A. 

1) The sum «k + à is the cardinal number of S U T. 

2) The product KA is the cardinal number of S x T. 

3) The power k° is the cardinal number of $7.0 


We will not go into the details of why these definitions make sense. (For 
instance, they seem to depend on the sets S and T, but in fact they do not.) It 
can be shown, using these definitions, that cardinal addition and multiplication 
are associative and commutative and that multiplication distributes over 
addition. 


Theorem 0.13 Let k, à and p be cardinal numbers. Then the following 
properties hold: 
1) (Associativity) 


K+ (A+ pt) = (K +A) + wand K(Ap) = (KA) 
2) (Commutativity) 
K+A=A+Kand kX = XK 
3) (Distributivity) 
K(A + fs) = KA + KEL 


4) (Properties of Exponents) 
a) KATE = KKE 
b) (K>)# = Ka 
c) (kA)! = OAD 


On the other hand, the arithmetic of cardinal numbers can seem a bit strange, as 
the next theorem shows. 


Theorem 0.14 Let k and X be cardinal numbers, at least one of which is 
infinite. Then 


K+ = KA = max{k, A} 0 


It is not hard to see that there is a one-to-one correspondence between the power 
set P(S) of a set S and the set of all functions from S to {0,1}. This leads to 
the following theorem. 


Theorem 0.15 For any cardinal k 
1) If|S| =k, then |P(S)| = 2" 
2) «.<2°O 


16 Advanced Linear Algebra 


We have already observed that |N| = No. It can be shown that No is the smallest 
infinite cardinal, that is, 


K < No => «k is a natural number 


It can also be shown that the set R of real numbers is in one-to-one 
correspondence with the power set P(N) of the natural numbers. Therefore, 


IR] = 2 
The set of all points on the real line is sometimes called the continuum and so 


2% is sometimes called the power of the continuum and denoted by c. 


Theorem 0.14 shows that cardinal addition and multiplication have a kind of 
“absorption” quality, which makes it hard to produce larger cardinals from 
smaller ones. The next theorem demonstrates this more dramatically. 


Theorem 0.16 

1) Addition applied a countable number of times or multiplication applied a 
finite number of times to the cardinal number Xo, does not yield anything 
more than Xo. Specifically, for any nonzero n € N, we have 


No : No = No and NG = No 


2) Addition and multiplication applied a countable number of times to the 
cardinal number 2*> does not yield more than 2*. Specifically, we have 


No -2% = 2 and (2) = 2 o 
Using this theorem, we can establish other relationships, such as 
W < (No)® < (208 = 2% 
which, by the Schröder-Bernstein theorem, implies that 
(to) = 2 


We mention that the problem of evaluating «ò in general is a very difficult one 
and would take us far beyond the scope of this book. 


We will have use for the following reasonable-sounding result, whose proof is 
omitted. 


Theorem 0.17 Let { A; | k € K} be a collection of sets, indexed by the set K, 
with |K| = 6. If |Ag| < for all k € K, then 


U 4: 


kek 


< AK O 


Let us conclude by describing the cardinality of some famous sets. 


Preliminaries 17 


Theorem 0.18 
1) The following sets have cardinality Xo. 
a) The rational numbers Q. 
b) The set of all finite subsets of N. 
c) The union of a countable number of countable sets. 
d) The set Z” of all ordered n-tuples of integers. 
2) The following sets have cardinality 2*°. 
a) The set of all points in R". 
b) The set of all infinite sequences of natural numbers. 
c) The set of all infinite sequences of real numbers. 
d) The set of all finite subsets of R. 
e) The set of all irrational numbers. O 


Part 2 Algebraic Structures 


We now turn to a discussion of some of the many algebraic structures that play a 
role in the study of linear algebra. 


Groups 


Definition A group is a nonempty set G, together with a binary operation 
denoted by *, that satisfies the following properties: 
1) (Associativity) For all a,b,c € G, 


(axb)*c = ax(bxc) 
2) (Identity) There exists an element e € G for which 
exa = axe =a 
foralla € G. 
3) (Inverses) For each a € G, there is an element a~' € G for which 
axa! = a™!xa =e 0 
Definition 4 group G is abelian, or commutative, if 
axb = bxa 
for all a,b € G. When a group is abelian, it is customary to denote the 
operation * by +, thus writing axb as a + b. It is also customary to refer to the 


identity as the zero element and to denote the inverse a™! by —a, referred to as 
the negative of a. O 


Example 0.7 The set F of all bijective functions from a set S to S is a group 
under composition of functions. However, in general, it is not abelian. 


Example 0.8 The set Mm n(F) is an abelian group under addition of matrices. 
The identity is the zero matrix Om n of size m x n. The set M,,(F’) is not a 
group under multiplication of matrices, since not all matrices have multiplicative 


18 Advanced Linear Algebra 


inverses. However, the set of invertible matrices of size n x n is a (nonabelian) 
group under multiplication. O 


A group G is finite if it contains only a finite number of elements. The 
cardinality of a finite group G is called its order and is denoted by 0(G) or 
simply |G|. Thus, for example, Zn = {0,1,...,n — 1} is a finite group under 
addition modulo n, but Mm n(R) is not finite. 


Definition A subgroup of a group G is a nonempty subset S of G that is a 
group in its own right, using the same operations as defined on G.O 
Cyclic Groups 


If a is a formal symbol, we can define a group G to be the set of all integral 
powers of a: 


G= {a |ieZ} 
where the product is defined by the formal rules of exponents: 
dial at 


This group is denoted by (a) and called the cyclic group generated by a. The 
identity of (a) is 1 = a. In general, a group G is cyclic if it has the form 
G = (a) for some a € G. 


We can also create a finite group C,,(a) of arbitrary positive order n by 
declaring that a” = 1. Thus, 


C,,(a) = {1 =a°,a,a7,...,a" 1} 
where the product is defined by the formal rules of exponents, followed by 
reduction modulo n: 

ataj = at?) modn 

This defines a group of order n, called a cyclic group of order n. The inverse 
of ak is aE) mod n 
Rings 
Definition A ring is a nonempty set R, together with two binary operations, 
called addition (denoted by + ) and multiplication (denoted by juxtaposition), 
for which the following hold: 


1) Ris an abelian group under addition 
2) (Associativity) For all a,b,c € R, 


(ab)c = a(bc) 


Preliminaries 19 


3) (Distributivity) For all a,b,c € R, 
(a+ b)c = ac + be and c(a + b) = ca + cb 
A ring R is said to be commutative if ab = ba for all a,b € R. If a ring R 
contains an element e with the property that 
ae=ea=a 


for alla € R, we say that R is a ring with identity. The identity e is usually 
denoted by 1.0) 


A field F is a commutative ring with identity in which each nonzero element 
has a multiplicative inverse, that is, if a € F is nonzero, then there is a b € F 
for which ab = 1. 


Example 0.9 The set Z,, = {0,1,...,n—1} is a commutative ring under 
addition and multiplication modulo n 


a®b=(a+b)modn, aO®b=abmodn 


The element 1 € Z, is the identity.0 


Example 0.10 The set E of even integers is a commutative ring under the usual 
operations on Z, but it has no identity. 


Example 0.11 The set M,,(/’) is a noncommutative ring under matrix addition 
and multiplication. The identity matrix J, is the identity for M,,(F’).O 


Example 0.12 Let F be a field. The set F[x] of all polynomials in a single 
variable x, with coefficients in F, is a commutative ring under the usual 
operations of polynomial addition and multiplication. What is the identity for 
F|a]? Similarly, the set F[xı,..., £n] of polynomials in n variables is a 
commutative ring under the usual addition and multiplication of polynomials.O 


Definition Jf R and S are rings, then a function o:R—S is a ring 
homomorphism if 


foralla,b € R.O 


Definition A subring of a ring R is a subset S of R that is a ring in its own 
right, using the same operations as defined on R and having the same 
multiplicative identity as R.O 


20 Advanced Linear Algebra 


The condition that a subring S have the same multiplicative identity as R is 
required. For example, the set S of all 2 x 2 matrices of the form 


a 0 
m= [a ol 


for a € F is a ring under addition and multiplication of matrices (isomorphic to 
F). The multiplicative identity in S is the matrix A, which is not the identity I> 
of M2 »5(F'). Hence, S is a ring under the same operations as Mz (F) but it is 
not a subring of M2 9(F). 


Applying the definition is not generally the easiest way to show that a subset of 
a ring is a subring. The following characterization is usually easier to apply. 


Theorem 0.19 4 nonempty subset S of a ring R is a subring if and only if 
1) The multiplicative identity 1p of R is in S 
2) S is closed under subtraction, that is, 


abEeSsa-bes 
3) Sis closed under multiplication, that is, 


abeSsabes m 


Ideals 
Rings have another important substructure besides subrings. 


Definition Let R be a ring. A nonempty subset T of R is called an ideal if 
1) T is a subgroup of the abelian group R, that is, T is closed under 
subtraction: 


abeTs>a-beT 
2) T is closed under multiplication by any ring element, that is, 
a€TZT,reR=earéeTandract O 
Note that if an ideal Z contains the unit element 1, then Z = R. 
Example 0.13 Let p(x) be a polynomial in Fz]. The set of all multiples of 
p(z), 
(p(x)) = {a(@) p(w) | a(x) € Fla]} 


is an ideal in F [z], called the ideal generated by p(x).0 


Definition Let S be a subset of a ring R with identity. The set 


(S) = {r1s1 + + TnSn | ri E€ R s; E S,n > 1} 


Preliminaries 21 


of all finite linear combinations of elements of S, with coefficients in R, is an 
ideal in R, called the ideal generated by S. It is the smallest (in the sense of set 
inclusion) ideal of R containing S. If S = {81,... , Sn} is a finite set, we write 


(S1,--. 5 $n) = {r1S1 tes +7 nSn | ri E R,s; E€ S} O 


Note that in the previous definition, we require that R have an identity. This is 
to ensure that S C (S). 


Theorem 0.20 Let R be a ring. 

I) The intersection of any collection {Ty | k € K} of ideals is an ideal. 

2) LCL C- is an ascending sequence of ideals, each one contained in 
the next, then the union |_JT; is also an ideal. 

3) More generally, if 


C={Zlieh} 


is a chain of ideals in R, then the union J = \U;<;7i is also an ideal in R. 
Proof. To prove 1), let J = (Zx. Then if a,b € J, we have a,b € Tp for all 
k € K. Hence, a — b € Z forall k € K and so a — b € J. Hence, J is closed 
under subtraction. Also, if r € R, then ra € T, for all k € K and so ra € J. Of 
course, part 2) is a special case of part 3). To prove 3), if a,b € J, then a € Ti 
and b € Zj for some i, j € J. Since one of Z; and Z; is contained in the other, we 
may assume that Z; C Zj. It follows that a,b € Z; and so a — b € Zj C J and if 
r € R, then ra € T; C J. Thus J is an ideal.O 


Note that in general, the union of ideals is not an ideal. However, as we have 
just proved, the union of any chain of ideals is an ideal. 


Quotient Rings and Maximal Ideals 


Let S be a subset of a commutative ring R with identity. Let = be the binary 
relation on R defined by 


a=bSa-bes 


It is easy to see that = is an equivalence relation. When a = b, we say that a 
and b are congruent modulo S. The term “mod” is used as a colloquialism for 
modulo and a = b is often written 


a = bmod § 


As shorthand, we write a = b. 


22 Advanced Linear Algebra 


To see what the equivalence classes look like, observe that 
fa] = {re R|r=a} 
={reR|r—-aeSs} 
={reR|r=a+sforsomes € S} 
={a+s|sEs} 
=a+8 


The set 

at+S={a+s|seS} 
is called a coset of S in R. The element a is called a coset representative for 
a+S. 
Thus, the equivalence classes for congruence mod S are the cosets a+ S of S 
in R. The set of all cosets is denoted by 

R/S={a+S|aeR} 


This is read “R mod S.” We would like to place a ring structure on R/S. 
Indeed, if S is a subgroup of the abelian group R, then R/S is easily seen to be 
an abelian group as well under coset addition defined by 


(a+ S)+(b+S)=(a+b)+S 
In order for the product 
(a+S)(b+ S)=ab+S 
to be well-defined, we must have 
b+S=0'+S>ab+S=ab'+S 
or, equivalently, 
b-bESSalb—-D)ES 
But b — b' may be any element of 5 and a may be any element of R and so this 


condition implies that S must be an ideal. Conversely, if S is an ideal, then 
coset multiplication is well defined. 


Theorem 0.21 Let R be a commutative ring with identity. Then the quotient 
R/T is a ring under coset addition and multiplication if and only if T is an 
ideal of R. In this case, R/T is called the quotient ring of R modulo Z, where 
addition and multiplication are defined by 


(a+ S)+(b+S)=(a+b)+S o 
(a+ S)(b+S)=ab+S 


Preliminaries 23 


Definition An ideal T in a ring R is a maximal ideal if T + R and if whenever 
J is an ideal satisfying T C J C R, then either J = T or J = R.O 


Here is one reason why maximal ideals are important. 


Theorem 0.22 Let R be a commutative ring with identity. Then the quotient 
ring R/T is a field if and only if T is a maximal ideal. 

Proof. First, note that for any ideal Z of R, the ideals of R/T are precisely the 
quotients J /Z where J is an ideal for which Z C J C R. It is clear that J /T 
is an ideal of R/T. Conversely, if K’ is an ideal of R/T, then let 


K={reR|r+IeKk} 


It is easy to see that K is an ideal of R for which J CKC R. 


Next, observe that a commutative ring S with identity is a field if and only if S 
has no nonzero proper ideals. For if S is a field and Z is an ideal of S 
containing a nonzero element r, then 1 = r-'r € Z and so Z = S. Conversely, 
if S has no nonzero proper ideals and 0 Æ s € S, then the ideal (s) must be S$ 
and so there is an r € S for which rs = 1. Hence, S is a field. 


Putting these two facts together proves the theorem. O 
The following result says that maximal ideals always exist. 


Theorem 0.23 Any nonzero commutative ring R with identity contains a 
maximal ideal. 

Proof. Since R is not the zero ring, the ideal {0} is a proper ideal of R. Hence, 
the set S of all proper ideals of R is nonempty. If 

esine 

is a chain of proper ideals in R, then the union J = UZ; is also an ideal. 
Furthermore, if 7 = R is not proper, then 1 € J and so 1 € 7;, for some i € J, 
which implies that Z; = R is not proper. Hence, J € S. Thus, any chain in S 
has an upper bound in S and so Zorn's lemma implies that S has a maximal 
element. This shows that R has a maximal ideal.O 


Integral Domains 


Definition Let R be a ring. A nonzero element r € R is called a zero divisor if 
there exists a nonzero s € R for which rs = 0. A commutative ring R with 
identity is called an integral domain if it contains no zero divisors. 0O 


Example 0.14 If n is not a prime number, then the ring Zn has zero divisors and 
so is not an integral domain. To see this, observe that if n is not prime, then 
n = ab in Z, where a,b > 2. But in Zn, we have 


24 Advanced Linear Algebra 


a©b = abmod n = 0 
and so a and b are both zero divisors. As we will see later, if n is a prime, then 


Zn is a field (which is an integral domain, of course). O 


Example 0.15 The ring F'[a] is an integral domain, since p(a)q(x) = 0 implies 
that p(x) = 0 or g(x) = 0.0 


If R is a ring and ra = ry where r, x,y € R, then we cannot in general cancel 
the r's and conclude that x = y. For instance, in Z4, we have 2-3 = 2-1, but 
canceling the 2's gives 3 = 1. However, it is precisely the integral domains in 
which we can cancel. The simple proof is left to the reader. 


Theorem 0.24 Let R be a commutative ring with identity. Then R is an integral 
domain if and only if the cancellation law 
re=ry,r#0S>n4=y 
holds.O 
The Field of Quotients of an Integral Domain 


Any integral domain R can be embedded in a field. The quotient field (or field 
of quotients) of R is a field that is constructed from R just as the field of 
rational numbers is constructed from the ring of integers. In particular, we set 


Rt = {(p,q) | p,q € R,q4 #0} 


where (p,q) = (p’,q’) if and only if pq’ = p'q. Addition and multiplication of 
fractions is defined by 


(p,q) + (r,s) = (ps + qr, qs) 
and 
(p,q): (r,s) = (pr, qs) 


It is customary to write (p,q) in the form p/q. Note that if R has zero divisors, 
then these definitions do not make sense, because qs may be 0 even if q and s 
are not. This is why we require that R be an integral domain. 


Principal Ideal Domains 


Definition Let R be a ring with identity and let a € R. The principal ideal 
generated by a is the ideal 


(a) = {ra| re R} 


An integral domain R in which every ideal is a principal ideal is called a 
principal ideal domain. O 


Preliminaries 25 


Theorem 0.25 The integers form a principal ideal domain. In fact, any ideal T 
in Z is generated by the smallest positive integer a that is contained in T.O 


Theorem 0.26 The ring F(x] is a principal ideal domain. In fact, any ideal T is 
generated by the unique monic polynomial of smallest degree contained in T. 
Moreover, for polynomials p,(x),... , Pp(x), 


(pi(@); +++, Pa(a)) = (ged{pi(), --- ; Pn(x)}) 


Proof. Let Z be an ideal in F'[x] and let m(x) be a monic polynomial of 
smallest degree in Z. First, we observe that there is only one such polynomial in 
T. For if n(x) € Z is monic and deg(n(x)) = deg(m(x)), then 


b(z) = m(ax)—-—n(x) €T 
and since deg(b(x)) < deg(m(a)), we must have b(x)=0 and so 
n(x) =m(z): 
We show that Z = (m(a)). Since m(x) € Z, we have (m(a)) C T. To establish 
the reverse inclusion, if p(x) € Z, then dividing p(x) by m(x) gives 
p(x) = q(a)m(a) + r(x) 
where r(x) = 0 or 0 < deg r(x) < deg m(x). But since Z is an ideal, 
r(x) = p(x) — q(a)m(ax) € T 
and so 0 < degr(x) < deg m(x) is impossible. Hence, r(x) = 0 and 
p(z) = q(ax)m(x) € (m(x)) 
This shows that Z C (m(a)) and so Z = (m(a)). 
To prove the second statement, let Z = (pi(a),... ,pn(a)). Then, by what we 
have just shown, 
T = (pi(x),--- , Pn(x)) = (m(z)) 


where m(x) is the unique monic polynomial m(x) in Z of smallest degree. In 
particular, since p;(x) € (m(ax)), we have m(x) | p;(a) for each i = 1,... ,n. 
In other words, m(x) is a common divisor of the p;(a)'s. 


Moreover, if q(x) | pi(x) for all i, then p;(a) € (q(x)) for all i, which implies 
that 


m(x) € (m(x)) = (pi (a), +++ , Pn(@)) C (a(@)) 


and so q(x) | m(x). This shows that m(x) is the greatest common divisor of the 
pi(x)'s and completes the proof.O 


26 Advanced Linear Algebra 


Example 0.16 The ring R = F'[x, y] of polynomials in two variables x and y is 
not a principal ideal domain. To see this, observe that the set Z of all 
polynomials with zero constant term is an ideal in R. Now, suppose that Z is the 
principal ideal Z = (p(x, y)). Since x,y € Z, there exist polynomials a(x, y) 
and b(x, y) for which 


x = a(x, y)p(x, y) and y = d(x, y)p(z, y) (0.1) 


But p(x,y) cannot be a constant, for then we would have Z = R. Hence, 
deg(p(x,y)) > 1 and so a(x,y) and b(x,y) must both be constants, which 
implies that (0.1) cannot hold.O 


Theorem 0.27 Any principal ideal domain R satisfies the ascending chain 
condition, that is, R cannot have a strictly increasing sequence of ideals 


CRC. 
where each ideal is properly contained in the next one. 


Proof. Suppose to the contrary that there is such an increasing sequence of 
ideals. Consider the ideal 
U =|] 


which must have the form U = (a) for some a € U. Since a € T} for some k, 
we have Z, = Z; for all j > k, contradicting the fact that the inclusions are 
proper. O 


Prime and Irreducible Elements 


We can define the notion of a prime element in any integral domain. For 
r,s € R, we say that r divides s (written r | s) if there exists an x € R for 
which s = zr. 


Definition Let R be an integral domain. 

1) An invertible element of R is called a unit. Thus, u € R is a unit if uv = 1 
for some vE R. 

2) Two elements a,b € R are said to be associates if there exists a unit u for 
which a = ub. We denote this by writing a ~ b. 

3) A nonzero nonunit p € R is said to be prime if 


p|ab=>p|aorp|b 
4) A nonzero nonunit r € R is said to be irreducible if 
r= ab => aor bisa unit O 


Note that if p is prime or irreducible, then so is up for any unit u. 


The property of being associate is clearly an equivalence relation. 


Preliminaries 27 


Definition We will refer to the equivalence classes under the relation of being 
associate as the associate classes of R.O 


Theorem 0.28 Let R be a ring. 

1) An element u € R is a unit if and only if (u) = R. 

2) r~sifand only if (r) = (s). 

3) r divides s if and only if (s) C (r). 

4) r properly divides s, that is, s = xr where x is not a unit, if and only if 
(s) c (r).0 


In the case of the integers, an integer is prime if and only if it is irreducible. In 
any integral domain, prime elements are irreducible, but the converse need not 
hold. (In the ring Z[\/—5] = {a + b\/—5 | a,b € Z} the irreducible element 2 
divides the product (1 + \/—5)(1 — \/—5) =6 but does not divide either 
factor.) 


However, in principal ideal domains, the two concepts are equivalent. 


Theorem 0.29 Let R be a principal ideal domain. 

I) Anr € Ris irreducible if and only if the ideal (r) is maximal. 

2) An element in R is prime if and only if it is irreducible. 

3) The elements a,b € R are relatively prime, that is, have no common 
nonunit factors, if and only if there exist r,s € R for which 


ra+sb=1 


This is denoted by writing (a,b) = 1. 

Proof. To prove 1), suppose that r is irreducible and that (r) C (a) C R. Then 
r € (a) and so r = xa for some x € R. The irreducibility of r implies that a or 
x is a unit. If a is a unit, then (a) = R and if x is a unit, then (a) = (xa) = (r). 
This shows that (r) is maximal. (We have (r) # R, since r is not a unit.) 
Conversely, suppose that r is not irreducible, that is, r = ab where neither a nor 
b is a unit. Then (r) C (a) C R. But if (a) = (r), then r ~ a, which implies that 
b is a unit. Hence (r) Æ (a). Also, if (a) = R, then a must be a unit. So we 
conclude that (r) is not maximal, as desired. 


To prove 2), assume first that p is prime and p = ab. Then p | a or p | b. We 
may assume that p | a. Therefore, a = xp = xab. Canceling a's gives 1 = xb 
and so b is a unit. Hence, p is irreducible. (Note that this argument applies in 
any integral domain.) 


Conversely, suppose that r is irreducible and let r | ab. We wish to prove that 
r|aorr |b. The ideal (r} is maximal and so (r,a) = (r) or (r,a) = R. In the 
former case, r | a and we are done. In the latter case, we have 


1l=sa+yr 


28 Advanced Linear Algebra 


for some x, y € R. Thus, 
b = zab + yrb 


and since r divides both terms on the right, we have r | b. 


To prove 3), it is clear that if ra + sb = 1, then a and b are relatively prime. For 
the converse, consider the ideal (a,b), which must be principal, say 
(a,b) = (x). Then x | a and x | b and so x must be a unit, which implies that 
(a,b) = R. Hence, there exist r,s € R for which ra + sb = 1.0 


Unique Factorization Domains 


Definition An integral domain R is said to be a unique factorization domain 

if it has the following factorization properties: 

1) Every nonzero nonunit element r € R can be written as a product of a finite 
number of irreducible elements r = p,-++Dn. 

2) The factorization into irreducible elements is unique in the sense that if 
r = pi: tPn and r = qi: -qm are two such factorizations, then m = n and 
after a suitable reindexing of the factors, pi ~ q; O 


Unique factorization is clearly a desirable property. Fortunately, principal ideal 
domains have this property. 


Theorem 0.30 Every principal ideal domain R is a unique factorization 
domain. 

Proof. Let r € R be a nonzero nonunit. If r is irreducible, then we are done. If 
not, then r = rır, where neither factor is a unit. If rı and rə are irreducible, we 
are done. If not, suppose that rə is not irreducible. Then r2 = r3r4, where 
neither r3 nor r4 is a unit. Continuing in this way, we obtain a factorization of 
the form (after renumbering if necessary) 


r= Tro = rı(rsra) = (rır3)(rsre) = (rır3rs)(r7rs) R 


Each step is a factorization of r into a product of nonunits. However, this 
process must stop after a finite number of steps, for otherwise it will produce an 
infinite sequence s1, S2,... of nonunits of R for which s;+ı properly divides s;. 
But this gives the ascending chain of ideals 


(81) C (82) C (83) C (s4) Co 


where the inclusions are proper. But this contradicts the fact that a principal 
ideal domain satisfies the ascending chain condition. Thus, we conclude that 
every nonzero nonunit has a factorization into irreducible elements. 


As to uniqueness, if r = p,---p, and r = q,-+-Gm are two such factorizations, 
then because R is an integral domain, we may equate them and cancel like 
factors, so let us assume this has been done. Thus, p; Æ q; for all i, j. If there are 
no factors on either side, we are done. If exactly one side has no factors left, 


Preliminaries 29 


then we have expressed 1 as a product of irreducible elements, which is not 
possible since irreducible elements are nonunits. 


Suppose that both sides have factors left, that is, 
Pit’ Pn = ON1°° dm 


where p; # q;. Then qm | Pi’ **Pn, Which implies that qm | p; for some i. We can 
assume by reindexing if necessary that pa = anqm. Since p, is irreducible an 
must be a unit. Replacing pn by anqm and canceling qm gives 


an Pi’ 'Pn-1 = 41: '0m-1 


This process can be repeated until we run out of q's or p's. If we run out of q's 
first, then we have an equation of the form upiı---pp = 1 where u is a unit, 
which is not possible since the p;'s are not units. By the same reasoning, we 
cannot run out of q's first and so n = m and the p's and q's can be paired off as 
associates. 


Fields 


For the record, let us give the definition of a field (a concept that we have been 
using). 


Definition A field is a set F, containing at least two elements, together with two 

binary operations, called addition (denoted by +) and multiplication 

(denoted by juxtaposition), for which the following hold: 

1) F is an abelian group under addition. 

2) The set F* of all nonzero elements in F is an abelian group under 
multiplication. 

3) (Distributivity) For all a,b,c € F, 


(a+ b)c = ac + be and c(a + b) = ca + cb T 


We require that F’ have at least two elements to avoid the pathological case in 
which 0 = 1. 


Example 0.17 The sets Q, R and C, of all rational, real and complex numbers, 
respectively, are fields, under the usual operations of addition and multiplication 
of numbers. O 


Example 0.18 The ring Z, is a field if and only if n is a prime number. We 
have already seen that Z,, is not a field if n is not prime, since a field is also an 
integral domain. Now suppose that n = p is a prime. 


We have seen that Z, is an integral domain and so it remains to show that every 
nonzero element in Z, has a multiplicative inverse. Let 0 Aa € Zp. Since 
a < p, we know that a and p are relatively prime. It follows that there exist 
integers u and v for which 


30 Advanced Linear Algebra 


ua+vup=1 
Hence, 
ua = (1 — vp) = 1 mod p 


and so u © a = 1 in Zp, that is, u is the multiplicative inverse of a. 


The previous example shows that not all fields are infinite sets. In fact, finite 
fields play an extremely important role in many areas of abstract and applied 
mathematics. 


A field F is said to be algebraically closed if every nonconstant polynomial 
over F has a root in F. This is equivalent to saying that every nonconstant 
polynomial splits over F. For example, the complex field C is algebraically 
closed but the real field R is not. We mention without proof that every field F is 
contained in an algebraically closed field F, called the algebraic closure of F. 
For example, the algebraic closure of the real field is the complex field. 


The Characteristic of a Ring 


Let R be a ring with identity. If n is a positive integer, then by n - r, we simply 
mean 


n- r= r+: +r 
n terms 
Now, it may happen that there is a positive integer n for which 
n-1=0 


For instance, in Z,, we have n-1=n=0. On the other hand, in Z, the 
equation n - 1 = 0 implies n = 0 and so no such positive integer exists. 


Notice that in any finite ring, there must exist such a positive integer n, since the 
members of the infinite sequence of numbers 

1-1,2-1,3-1,... 
cannot be distinct and soi: 1 = j - 1 for some i < j , whence (j — i) -1 = 0. 
Definition Let R be a ring with identity. The smallest positive integer c for 
which c- 1 = 0 is called the characteristic of R. If no such number c exists, we 
say that R has characteristic 0. The characteristic of R is denoted by 
char(R).O 
If char( R) = c, then for any r € R, we have 


ers rte tr=(14+::-+1)r=0-r=0 
c terms c terms 


Preliminaries 31 


Theorem 0.31 Any finite ring has nonzero characteristic. Any finite integral 
domain has prime characteristic. 

Proof. We have already seen that a finite ring has nonzero characteristic. Let F 
be a finite integral domain and suppose that char(F’) = c > 0. If c = pq, where 
p,q <c, then pg: 1=0. Hence, (p-1)(q- 1) = 0, implying that p-1=0 or 
q: 1 = 0. In either case, we have a contradiction to the fact that c is the smallest 
positive integer such that c - 1 = 0. Hence, c must be prime.O 


Notice that in any field F of characteristic 2, we have 2a = 0 for all a € F. 
Thus, in F, 


a = —a foralla € F 


This property takes a bit of getting used to and makes fields of characteristic 2 
quite exceptional. (As it happens, there are many important uses for fields of 
characteristic 2.) It can be shown that all finite fields have size equal to a 
positive integral power p” of a prime p and for each prime power p”, there is a 
finite field of size p”. In fact, up to isomorphism, there is exactly one finite field 
of size p”. 


Algebras 


The final algebraic structure of which we will have use is a combination of a 
vector space and a ring. (We have not yet officially defined vector spaces, but 
we will do so before needing the following definition, which is placed here for 
easy reference.) 


Definition An algebra A over a field F is a nonempty set A, together with 
three operations, called addition (denoted by + ), multiplication (denoted by 
juxtaposition) and scalar multiplication (also denoted by juxtaposition), for 
which the following properties hold: 

1) Aisa vector space over F under addition and scalar multiplication. 

2) Aisa ring under addition and multiplication. 

3) Ifr € F anda,b € A, then 


r(ab) = (ra)b = a(rb) 0 
Thus, an algebra is a vector space in which we can take the product of vectors, 


or a ring in which we can multiply each element by a scalar (subject, of course, 
to additional requirements as given in the definition). 


Part I—Basic Linear Algebra 


Chapter 1 
Vector Spaces 


Vector Spaces 


Let us begin with the definition of one of our principal objects of study. 


Definition Let F be a field, whose elements are referred to as scalars. A vector 
space over F is a nonempty set V, whose elements are referred to as vectors, 
together with two operations. The first operation, called addition and denoted 
by +, assigns to each pair (u,v) of vectors in V a vector u+v in V. The 
second operation, called scalar multiplication and denoted by juxtaposition, 
assigns to each pair (r,u) € F xV a vector ru in V. Furthermore, the 
following properties must be satisfied: 

1) (Associativity of addition) For all vectors u,v,w E€ V, 


u+ (v+ w) =(ut+v)+w 
2) (Commutativity of addition) For all vectors u,v € V, 
utv=v+u 
3) (Existence of a zero) There is a vector 0 € V with the property that 
O+u=ut0=u 


for all vectors u € V. 
4) (Existence of additive inverses) For each vector u € V, there is a vector 
in V, denoted by —u, with the property that 


u+ (—u) = (-u)+u=0 


36 Advanced Linear Algebra 


5) (Properties of scalar multiplication) For all scalars a,b € F and for all 
vectors u,v € V, 


a(u+v) = au + av o 
(a+ b)u = au + bu 
(ab)u = a(bu) 


lu=u 


Note that the first four properties in the definition of vector space can be 
summarized by saying that V is an abelian group under addition. 


A vector space over a field F is sometimes called an F’-space. A vector space 
over the real field is called a real vector space and a vector space over the 
complex field is called a complex vector space. 


Definition Let S be a nonempty subset of a vector space V. A linear 
combination of vectors in S is an expression of the form 


av, ea AnVUn 
where vi,..., ùn E S and ay,...,4, E F. The scalars a; are called the 
coefficients of the linear combination. A linear combination is trivial if every 
coefficient a; is zero. Otherwise, it is nontrivial. O 


Examples of Vector Spaces 


Here are a few examples of vector spaces. 


Example 1.1 

1) Let F be a field. The set F? of all functions from F to F is a vector space 
over F, under the operations of ordinary addition and scalar multiplication 
of functions: 


(f + 9)(x) = f(x) + g(x) 
and 
(af) (x) = a(f(x)) 


2) The set Mm n(F) of all m x n matrices with entries in a field F is a vector 
space over F, under the operations of matrix addition and scalar 
multiplication. 

3) The set F” of all ordered n-tuples whose components lie in a field F, is a 
vector space over F, with addition and scalar multiplication defined 
componentwise: 


(a1,.-- Qn) + (b1,..- bn) = (ai + by,... „an + bn) 


and 


Vector Spaces 37 


c(a1,... 0n) = (ca1,... , Can) 


When convenient, we will also write the elements of F” in column form. 
When F is a finite field F, with q elements, we write V (n, q) for FY’. 

4) Many sequence spaces are vector spaces. The set Seq(F) of all infinite 
sequences with members from a field F is a vector space under the 
componentwise operations 


(Sn) + (tr) = (Sn + tn) 
and 
a(sn) = (asn) 


In a similar way, the set cọ of all sequences of complex numbers that 
converge to 0 is a vector space, as is the set 2° of all bounded complex 
sequences. Also, if p is a positive integer, then the set @? of all complex 
sequences (s,,) for which 


CO 
5 [sn]? < œ 
n=1 


is a vector space under componentwise operations. To see that addition is a 
binary operation on &, one verifies Minkowski's inequality 


Sö 1/p a 1/p Ss 1/p 
(>: ae np) < (>: er) + oa i) 
n=1 n=1 n=1 


which we will not do here. 


Subspaces 


Most algebraic structures contain substructures, and vector spaces are no 
exception. 


Definition 4 subspace of a vector space V is a subset S of V that is a vector 
space in its own right under the operations obtained by restricting the 
operations of V to S. We use the notation S < V to indicate that S is a 
subspace of V and S < V to indicate that S is a proper subspace of V, that is, 
S < V but S + V. The zero subspace of V is {0}.0 


Since many of the properties of addition and scalar multiplication hold a fortiori 
in a nonempty subset S, we can establish that S is a subspace merely by 
checking that S is closed under the operations of V. 


Theorem 1.1 4 nonempty subset S of a vector space V is a subspace of V if 
and only if S is closed under addition and scalar multiplication or, equivalently, 


38 Advanced Linear Algebra 


S is closed under linear combinations, that is, 
a,b E€ F,uveS>au+bves O 


Example 1.2 Consider the vector space V(n,2) of all binary n-tuples, that is, 
n-tuples of 0's and 1's. The weight W(v) of a vector v € V (n, 2) is the number 
of nonzero coordinates in v. For instance, W(101010) = 3. Let En be the set of 
all vectors in V of even weight. Then E, is a subspace of V (n, 2). 


To see this, note that 
Wu + v) = Wu) + W(v) — 2W(u N v) 


where u N v is the vector in V(n,2) whose ith component is the product of the 
ith components of u and v, that is, 


(UNV); = Uz: vi 


Hence, if W(u) and W(v) are both even, so is W(u + v). Finally, scalar 
multiplication over F, is trivial and so E,, is a subspace of V(n, 2), known as 
the even weight subspace of V (n,2).00 


Example 1.3 Any subspace of the vector space V (n, q) is called a linear code. 
Linear codes are among the most important and most studied types of codes, 
because their structure allows for efficient encoding and decoding of 
information. O 


The Lattice of Subspaces 


The set S(V) of all subspaces of a vector space V is partially ordered by set 
inclusion. The zero subspace {0} is the smallest element in S(V) and the entire 
space V is the largest element. 


If S,T € S(V), then SMT is the largest subspace of V that is contained in 
both S and T. In terms of set inclusion, S N T is the greatest lower bound of S 
and T: 


SAT =glb{S,T} 


Similarly, if {S; |i € K} is any collection of subspaces of V, then their 
intersection is the greatest lower bound of the subspaces: 


(NS: = glb{s; |i € K} 
iek 


On the other hand, if S,T € S(V) (and F is infinite), then SUT € S(V) if 
and only if SCT or T C S. Thus, the union of two subspaces is never a 
subspace in any “interesting” case. We also have the following. 


Vector Spaces 39 


Theorem 1.2 A nontrivial vector space V over an infinite field F is not the 
union of a finite number of proper subspaces. 
Proof. Suppose that V = S1 U --- U Sn, where we may assume that 


Sı Z S2 U- U Sn 
Let w € S1 \ (S2 U --- U Sh) and let v ¢ S1. Consider the infinite set 
A={rw+v|rEeF} 


which is the “line” through v, parallel to w. We want to show that each S; 
contains at most one vector from the infinite set A, which is contrary to the fact 
that V = S1 U --- U Sn. This will prove the theorem. 


If rw +v € Sı for r £0, then w € Sı implies v € S$), contrary to assumption. 
Next, suppose that riw +v € S; and row +v E S;, for i > 2, where rı Æ ro. 
Then 


Si D (rıw +v) — (row + v) = (rı — ro)w 


and so w € S;, which is also contrary to assumption. O 


To determine the smallest subspace of V containing the subspaces S' and T, we 
make the following definition. 


Definition Let S and T be subspaces of V. The sum S + T is defined by 
S+T={u+tvl[ueS,veT} 


More generally, the sum of any collection {S; | i E€ K} of subspaces is the set 
of all finite sums of vectors from the union \JSj: 


Dsi fates seys] O 


iEK ick 


It is not hard to show that the sum of any collection of subspaces of V is a 
subspace of V and that the sum is the least upper bound under set inclusion: 


S +T = lub{S,T} 
More generally, 
XS; = lub{ S; | i € K} 
icK 


If a partially ordered set P has the property that every pair of elements has a 
least upper bound and greatest lower bound, then P is called a lattice. If P has 
a smallest element and a largest element and has the property that every 
collection of elements has a least upper bound and greatest lower bound, then P 


40 Advanced Linear Algebra 


is called a complete lattice. The least upper bound of a collection is also called 
the join of the collection and the greatest lower bound is called the meet. 


Theorem 1.3 The set S(V) of all subspaces of a vector space V is a complete 
lattice under set inclusion, with smallest element {0}, largest element V, meet 


iek 
and join 
lub{S; | i € K} =X S; o 
iEeK 
Direct Sums 


As we will see, there are many ways to construct new vector spaces from old 
ones. 


External Direct Sums 


Definition Let Vi, ..., V, be vector spaces over a field F. The external direct 
sum of Vi, ... , Vn denoted by 


is the vector space V whose elements are ordered n-tuples: 
V ={(u1,.-.,Un) |v; E Vp i= 1,... n} 
with componentwise operations 
(t,.-- Un) + (1,--. Un) = (Ur + U1,.-- , Un + Un) 
and 
Pleci Un) = (r0 gT Un) 


forallr € F.O 


Example 1.4 The vector space F” is the external direct sum of n copies of F, 
that is, 


FP =Fh--BF 


where there are n summands on the right-hand side. O 


This construction can be generalized to any collection of vector spaces by 
generalizing the idea that an ordered n-tuple (v1,... , vn) is just a function 
f:{1,...,n} — UV; from the index set {1,...,n} to the union of the spaces 
with the property that f(z) € Vj. 


Vector Spaces 41 


Definition Let F = {V; | i € K} be any family of vector spaces over F. The 
direct product of F is the vector space 


[[vi= {iw Uv 


ick ick 


foen) 


thought of as a subspace of the vector space of all functions from K to UV;.0 


It will prove more useful to restrict the set of functions to those with finite 
support. 


Definition Let F = {V; |i € K} be a family of vector spaces over F. The 
support of a function f: K — JV; is the set 
supp(f) = {i € K | fl) #0} 


Thus, a function f has finite support if f (i) = 0 for all but a finite number of 
i € K. The external direct sum of the family F is the vector space 


B= {rey 


ick ick 


f(t) € Vi, f has finite spo 


thought of as a subspace of the vector space of all functions from K to UV;.0 


An important special case occurs when V; = V for all i € K. If we let V“ 
denote the set of all functions from K to V and (V“)o denote the set of all 
functions in V* that have finite support, then 


[lV =V* and QV = (V*)o 
ick ick 


Note that the direct product and the external direct sum are the same for a finite 
family of vector spaces. 


Internal Direct Sums 


An internal version of the direct sum construction is often more relevant. 


Definition A vector space V is the (internal) direct sum of a family 
F = {S; | i € I} of subspaces of V, written 


V=QF or V=QS: 


iel 


if the following hold: 


42 Advanced Linear Algebra 


1) (Join of the family) V is the sum (join) of the family F: 
V=% S; 
icl 


2) (Independence of the family) For each i € I, 


Sn [Xs] = {0} 
Ji 
In this case, each S; is called a direct summand of V. If F = {S1,..., Sn} is a 
finite family, the direct sum is often written 


V=6 9 aSa 
Finally, if V = S T, then T is called a complement of S in V.O 


Note that the condition in part 2) of the previous definition is stronger than 
saying simply that the members of F are pairwise disjoint: 


SiN S= 
for alli A J €T. 


A word of caution is in order here: If S and T are subspaces of V, then we may 
always say that the sum S + T exists. However, to say that the direct sum of S 
and T exists or to write S @T is to imply that SMT = {0}. Thus, while the 
sum of two subspaces always exists, the direct sum of two subspaces does not 
always exist. Similar statements apply to families of subspaces of V. 


The reader will be asked in a later chapter to show that the concepts of internal 
and external direct sum are essentially equivalent (isomorphic). For this reason, 
the term “direct sum” is often used without qualification. 


Once we have discussed the concept of a basis, the following theorem can be 
easily proved. 


Theorem 1.4 Any subspace of a vector space has a complement, that is, if S is a 
subspace of V, then there exists a subspace T for which V = S $ T.O 


It should be emphasized that a subspace generally has many complements 
(although they are isomorphic). The reader can easily find examples of this in 
R?. 


We can characterize the uniqueness part of the definition of direct sum in other 
useful ways. First a remark. If S and T are distinct subspaces of V and if 
x,y E€ SOT, then the sum x + y can be thought of as a sum of vectors from the 


Vector Spaces 43 


same subspace (say S) or from different subspaces—one from S and one from 
T. When we say that a vector v cannot be written as a sum of vectors from the 
distinct subspaces S and T, we mean that v cannot be written as a sum z + y 
where x and y can be interpreted as coming from different subspaces, even if 
they can also be interpreted as coming from the same subspace. Thus, if 
x,y E SOT, then v= x + y does express v as a sum of vectors from distinct 
subspaces. 


Theorem 1.5 Let F = { S; | i € I} be a family of distinct subspaces of V. The 
following are equivalent: 
1) (Independence of the family) For each i € I, 


SiN Sa = {0} 


jži 


2) (Uniqueness of expression for 0) The zero vector 0 cannot be written as a 
sum of nonzero vectors from distinct subspaces of F. 

3) (Uniqueness of expression) Every nonzero v € V has a unique, except for 
order of terms, expression as a sum 


V= Sitt Sn 


of nonzero vectors from distinct subspaces in F. 


Hence, a sum 
v=rs 


tel 


is direct if and only if any one of 1)—3) holds. 
Proof. Suppose that 2) fails, that is, 


0 = sjh +o + Sj, 
where the nonzero s;,'s are from distinct subspaces 5),. Then n > 1 and so 
TSh = Sj Fo Sj, 
which violates 1). Hence, 1) implies 2). If 2) holds and 
v=8t-4+s, and v=ti +: ttm 


where the terms are nonzero and the s;'s belong to distinct subspaces in F and 
similarily for the t;'s, then 


0= s1 ++ Sn tie tm 
By collecting terms from the same subspaces, we may write 


0 = (si = ta) H + (Si = Cig) F Sia FoF Si, S bina TU bi 


44 Advanced Linear Algebra 


Then 2) implies that n = m = k and s;, = t;, for all u = 1,..., k. Hence, 2) 
implies 3). 


Finally, suppose that 3) holds. If 


OAVESN| YS 
jzi 
then v = s; € S; and 
Si = Shi HF Sj, 


where sj, € Sj, are nonzero. But this violates 3).0 
Example 1.5 Any matrix A € M, can be written in the form 


1 1 
A= (AtA) + A-A) (1.1) 
where A’ is the transpose of A. It is easy to verify that B is symmetric and C is 
skew-symmetric and so (1.1) is a decomposition of A as the sum of a symmetric 
matrix and a skew-symmetric matrix. 


Since the sets Sym and SkewSym of all symmetric and skew-symmetric 
matrices in M, are subspaces of Mn, we have 
Mn = Sym + SkewSym 


Furthermore, if S + T = S’ +7", where S and S” are symmetric and T and T” 
are skew-symmetric, then the matrix 


U=8-9 =T -T 


is both symmetric and skew-symmetric. Hence, provided that char( F) # 2, we 
must have U = 0 and so S = S’ and T = T”. Thus, 


Mn = Sym © SkewSym O 


Spanning Sets and Linear Independence 

A set of vectors spans a vector space if every vector can be written as a linear 
combination of some of the vectors in that set. Here is the formal definition. 
Definition The subspace spanned (or subspace generated) by a nonempty set 
S of vectors in V is the set of all linear combinations of vectors from S: 


(S) = span( S) = {r101 tee? + Tnn | ri E€ F, vi € S} 


Vector Spaces 45 


When S = {v,...,Un} is a finite set, we use the notation (v1,...,Un) or 
span(v1,..-,Un). A set S of vectors in V is said to span V, or generate V, if 
V = span($).0 


It is clear that any superset of a spanning set is also a spanning set. Note also 
that all vector spaces have spanning sets, since V spans itself. 


Linear Independence 


Linear independence is a fundamental concept. 


Definition Let V be a vector space. A nonempty set S of vectors in V is 
linearly independent if for any distinct vectors 81,..., Sn in S, 


ası + + anSsn=0 > a =0 foralli 


In words, S is linearly independent if the only linear combination of vectors 
from S that is equal to 0 is the trivial linear combination, all of whose 
coefficients are 0. If S is not linearly independent, it is said to be linearly 
dependent. O 


It is immediate that a linearly independent set of vectors cannot contain the zero 
vector, since then 1 - 0 = 0 violates the condition of linear independence. 


Another way to phrase the definition of linear independence is to say that S' is 
linearly independent if the zero vector has an “as unique as possible” expression 
as a linear combination of vectors from S. We can never prevent the zero vector 
from being written in the form 0 = 0s; + --- + Osn, but we can prevent 0 from 
being written in any other way as a linear combination of the vectors in S. 


For the introspective reader, the expression 0 = sı +(— 1sı) has two 
interpretations. One is 0 = as; + bs, where a = 1 and b = —1, but this does 
not involve distinct vectors so is not relevant to the question of linear 
independence. The other interpretation is 0 = sı + tı where tı = —s, Æ 51 
(assuming that sı #0). Thus, if S is linearly independent, then S cannot 
contain both sı and —s}. 


Definition Let S be a nonempty set of vectors in V. To say that a nonzero 
vector v € V is an essentially unique linear combination of the vectors in S is 
to say that, up to order of terms, there is one and only one way to express v as a 
linear combination 


V = 4181 + +++ + AnSn 


where the si's are distinct vectors in S and the coefficients a; are nonzero. More 
explicitly, v # 0 is an essentially unique linear combination of the vectors in S 
ifv € (S) and if whenever 


46 Advanced Linear Algebra 


V = 4181 +++ + anSn and v= bi ty ia bmtm 


where the s;'s are distinct, the t;'s are distinct and all coefficients are nonzero, 
then m = n and after a reindexing of the b;t;'s if necessary, we have a; = b; and 
si =t; for all i=1,...,n. (Note that this is stronger than saying that 


We may characterize linear independence as follows. 


Theorem 1.6 Let S 4 {0} be a nonempty set of vectors in V. The following are 

equivalent: 

1) Sis linearly independent. 

2) Every nonzero vector v€span(S) is an essentially unique linear 
combination of the vectors in S. 

3) No vector in S is a linear combination of other vectors in S. 

Proof. Suppose that 1) holds and that 


0 # v= as +e F anSn = biti +: + bmtm 
where the s;'s are distinct, the t;'s are distinct and the coefficients are nonzero. 


By subtracting and grouping s's and t's that are equal, we can write 


0= (ai, = bi, ) Si, puree (ai, E bir) Siz 
F Gin Sigg F F Gi, Sin 
= bipatim T0 S bintin 


Tre Vik+1 


and so 1) implies that n = m = k and a;, = b;, and s;, = ti, forall i = 1,...,k. 
Thus, 1) implies 2). 
If 2) holds and s € S can be written as 

S = Q181 +` + AnSn 


where s; € S are different from s, then we may collect like terms on the right 
and then remove all terms with 0 coefficient. The resulting expression violates 
2). Hence, 2) implies 3). If 3) holds and 


4181 Hee + GnSn = 0 


where the s;'s are distinct and a, Æ 0, then n > 1 and we may write 


1 
Sı = —— (a282 +--+ + anSn) 
ay 
which violates 3).0 


The following key theorem relates the notions of spanning set and linear 
independence. 


Vector Spaces 47 


Theorem 1.7 Let S be a set of vectors in V. The following are equivalent: 

I) Sis linearly independent and spans V. 

2) Every nonzero vector v € V is an essentially unique linear combination of 
vectors in S. 

3) S is a minimal spanning set, that is, S spans V but any proper subset of S 
does not span V. 

4) S is a maximal linearly independent set, that is, S is linearly independent, 
but any proper superset of S is not linearly independent. 

A set of vectors in V that satisfies any (and hence all) of these conditions is 

called a basis for V. 

Proof. We have seen that 1) and 2) are equivalent. Now suppose 1) holds. Then 

S is a spanning set. If some proper subset S’ of S also spanned V, then any 

vector in S — S$’ would be a linear combination of the vectors in S’, 

contradicting the fact that the vectors in S are linearly independent. Hence 1) 

implies 3). 


Conversely, if S is a minimal spanning set, then it must be linearly independent. 
For if not, some vector s € S would be a linear combination of the other vectors 
in S and so S — {s} would be a proper spanning subset of S, which is not 
possible. Hence 3) implies 1). 


Suppose again that 1) holds. If S were not maximal, there would be a vector 
v € V —S for which the set S U {v} is linearly independent. But then v is not 
in the span of S, contradicting the fact that S is a spanning set. Hence, S is a 
maximal linearly independent set and so 1) implies 4). 


Conversely, if S is a maximal linearly independent set, then S must span V, for 
if not, we could find a vector v € V — S that is not a linear combination of the 
vectors in S. Hence, S U {v} would be a linearly independent proper superset of 
S, which is a contradiction. Thus, 4) implies 1).0 


Theorem 1.8 A finite set S = {v1,...,Un} of vectors in V is a basis for V if 
and only if 
V = (v1) a- B (Un) Oo 


Example 1.6 The ith standard vector in F” is the vector e; that has O's in all 
coordinate positions except the ith, where it has a 1. Thus, 


e1 =(1,0,...,0), e2=(0,1,...,0) ,..., €n =(0,...,0,1) 


The set {e1,..., €n } is called the standard basis for F”.O 


The proof that every nontrivial vector space has a basis is a classic example of 
the use of Zorn's lemma. 


48 Advanced Linear Algebra 


Theorem 1.9 Let V be a nonzero vector space. Let I be a linearly independent 
set in V and let S be a spanning set in V containing I. Then there is a basis B 
for V for which I C B C S. In particular, 

1) Any vector space, except the zero space {0}, has a basis. 

2) Any linearly independent set in V is contained in a basis. 

3) Any spanning set in V contains a basis. 

Proof. Consider the collection A of all linearly independent subsets of V 
containing J and contained in S. This collection is not empty, since I € A. 
Now, if 


C={h| ke K} 


is a chain in A, then the union 


v=) 4 


kek 


is linearly independent and satisfies J C U C S, that is, U € A. Hence, every 
chain in A has an upper bound in A and according to Zorn's lemma, A must 
contain a maximal element B, which is linearly independent. 


Now, B is a basis for the vector space (S) = V, for if any s € S is not a linear 
combination of the elements of B, then BU {s} C S is linearly independent, 
contradicting the maximality of B. Hence S C (B) and so V = (S) c (B).0 


The reader can now show, using Theorem 1.9, that any subspace of a vector 
space has a complement. 


The Dimension of a Vector Space 


The next result, with its classical elegant proof, says that if a vector space V has 
a finite spanning set S, then the size of any linearly independent set cannot 
exceed the size of S. 


Theorem 1.10 Let V be a vector space and assume that the vectors v1,...,Un 
are linearly independent and the vectors s1,..., Sm span V. Then n < m. 
Proof. First, we list the two sets of vectors: the spanning set followed by the 
linearly independent set: 


SB], 00+5Smj U1, +++, Un 
Then we move the first vector v; to the front of the first list: 

Ul; S1; -<3 Sm; U2,--+;Un 
Since s1,..., Sm span V, vı is a linear combination of the s;'s. This implies that 


we may remove one of the s;'s, which by reindexing if necessary can be sı, 
from the first list and still have a spanning set 


U1, $25 +++ Sm; U2, +++) Un 


Vector Spaces 49 


Note that the first set of vectors still spans V and the second set is still linearly 
independent. 


Now we repeat the process, moving v2 from the second list to the first list 


U1, U2, 82, -+-5Smj U3, -+-5Un 


As before, the vectors in the first list are linearly dependent, since they spanned 
V before the inclusion of v2. However, since the v;'s are linearly independent, 
any nontrivial linear combination of the vectors in the first list that equals 0 
must involve at least one of the s;'s. Hence, we may remove that vector, which 
again by reindexing if necessary may be taken to be s and still have a spanning 
set 


U1, U2, 53, +++) Sm; U3, +++, Un 


Once again, the first set of vectors spans V and the second set is still linearly 
independent. 


Now, if m < n, then this process will eventually exhaust the s;'s and lead to the 
list 


V1, Uying Vind Umiss Un 
where v1, V2,---,;Um span V, which is clearly not possible since v, is not in the 
span of v1, V2, ... , Um. Hence, n < m.O 


Corollary 1.11 Jf V has a finite spanning set, then any two bases of V have the 
same size. Q 


Now let us prove the analogue of Corollary 1.11 for arbitrary vector spaces. 


Theorem 1.12 Jf V is a vector space, then any two bases for V have the same 
cardinality. 

Proof. We may assume that all bases for V are infinite sets, for if any basis is 
finite, then V has a finite spanning set and so Corollary 1.11 applies. 


Let B = {b; | i € I} be a basis for V and let C be another basis for V. Then any 
vector c € C can be written as a finite linear combination of the vectors in B, 
where all of the coefficients are nonzero, say 


c= 5 ribi 


se, 


But because C is a basis, we must have 


UU. =1 


cec 


50 Advanced Linear Algebra 


for if the vectors in C can be expressed as finite linear combinations of the 
vectors in a proper subset B’ of B, then B’ spans V, which is not the case. 
Since |U..| < No for all c € C, Theorem 0.17 implies that 

|B] = |Z| < XolC] = |C] 
But we may also reverse the roles of B and C, to conclude that |C| < |B| and so 
the Schréder—Bernstein theorem implies that |B] = |C|.O 


Theorem 1.12 allows us to make the following definition. 


Definition A vector space V is finite-dimensional if it is the zero space {0}, or 
if it has a finite basis. All other vector spaces are infinite-dimensional. The 
dimension of the zero space is 0 and the dimension of any nonzero vector 
space V is the cardinality of any basis for V. If a vector space V has a basis of 
cardinality k, we say that V is k-dimensional and write dim(V) = «K.O 


It is easy to see that if S is a subspace of V, then dim(S) < dim(V). If in 
addition, dim( S) = dim(V) < oo, then S = V. 


Theorem 1.13 Let V be a vector space. 
1) If Bisa basis for V and if B = Bı U Bə and Bı N By = , then 
V = (Bi) & (Bo) 

2) Let V=S T. If By is a basis for S and By is a basis for T, then 

Bı N By = 0 and B = By, U By is a basis for V.O 
Theorem 1.14 Let S and T be subspaces of a vector space V. Then 

dim(S) + dim(T) = dim(S + T) + dim(S N T) 
In particular, if T is any complement of S in V, then 
dim(S) + dim(T) = dim(V) 
that is, 
dim(S @ T) = dim( S) + dim(T) 


Proof. Suppose that 6 = {b; | i € I} is a basis for S N T. Extend this to a basis 
AUB for S where A = {a; | j E€ J} is disjoint from B. Also, extend B to a 
basis BUC for T where C = {cp | k € K} is disjoint from B. We claim that 
AU BUC isa basis for S + T. It is clear that (A U BUC) = S +T. 


To see that A U B UC is linearly independent, suppose to the contrary that 


Vector Spaces 51 


agv +++ + Ann = 0 


where v; € AUBUC and a; 4 0 for all i. There must be vectors v; in this 
expression from both A and C, since A U B and B UC are linearly independent. 
Isolating the terms involving the vectors from A on one side of the equality 
shows that there is a nonzero vector in x € (A) N (BUC). But then z € SAT 
and so x € (A)™(B), which implies that x = 0, a contradiction. Hence, 
AUB UCis linearly independent and a basis for S + T. 


Now, 
dim(S) + dim(T) = |AU B| + |BUC| 
= |A| + |B| + [B| + ic] 
= |A| + |B| + |C| + dim(S N T) 
= dim(S + T) + dim(S N T) 
as desired. 


It is worth emphasizing that while the equation 

dim(S) + dim(T) = dim(S + T) + dim(S N T) 
holds for all vector spaces, we cannot write 

dim(S' + T) = dim( S) + dim(T) — dim( S N T) 
unless S' + T is finite-dimensional. 
Ordered Bases and Coordinate Matrices 


It will be convenient to consider bases that have an order imposed on their 
members. 


Definition Let V be a vector space of dimension n. An ordered basis for V is 
an ordered n-tuple (v,,...,Un) of vectors for which the set {v1,..., Un} is a 
basis for V.O 


If B = (v1, ...,Un) is an ordered basis for V, then for each v € V there is a 
unique ordered n-tuple (r1, ..., rn) of scalars for which 


U= riv FE nUi 


Accordingly, we can define the coordinate map ¢g: V — F” by 


glv) = [vjs = | : (1.3) 


52 Advanced Linear Algebra 


where the column matrix [v]g is known as the coordinate matrix of v with 
respect to the ordered basis 6. Clearly, knowing [v]g is equivalent to knowing v 
(assuming knowledge of 8). 


Furthermore, it is easy to see that the coordinate map ¢g is bijective and 
preserves the vector space operations, that is, 


op (riv) a TnUn) = ridp(v1) E Tn @B(Un) 


or equivalently 


[rivi ase TnUn|B = [vils aoee Tn[Un]g 


Functions from one vector space to another that preserve the vector space 
operations are called linear transformations and form the objects of study in the 
next chapter. 


The Row and Column Spaces of a Matrix 


Let A be an m x n matrix over F. The rows of A span a subspace of F” known 
as the row space of A and the columns of A span a subspace of F™ known as 
the column space of A. The dimensions of these spaces are called the row rank 
and column rank, respectively. We denote the row space and row rank by 
rs(A) and rrk(A) and the column space and column rank by cs(A) and crk(A). 


It is a remarkable and useful fact that the row rank of a matrix is always equal to 
its column rank, despite the fact that if m Æ n, the row space and column space 
are not even in the same vector space! 


Our proof of this fact hinges on the following simple observation about 
matrices. 


Lemma 1.15 Let A be an m x n matrix. Then elementary column operations do 
not affect the row rank of A. Similarly, elementary row operations do not affect 
the column rank of A. 

Proof. The second statement follows from the first by taking transposes. As to 
the first, the row space of A is 


rs(A) = (e,A,...,€,A) 


where e; are the standard basis vectors in F™. Performing an elementary 
column operation on A is equivalent to multiplying A on the right by an 
elementary matrix E. Hence the row space of AF is 


rs(AE) = (e,AE,...,e,AE) 


and since F is invertible, 


Vector Spaces 53 


rk(A) = dim(rs(A)) = dim(rs(AE)) = rrk( AE) 
as desired. O 


Theorem 1.16 /f A E Mm,n then rrk(A) = crk( A). This number is called the 
rank of A and is denoted by rk( A). 

Proof. According to the previous lemma, we may reduce A to reduced column 
echelon form without affecting the row rank. But this reduction does not affect 
the column rank either. Then we may further reduce A to reduced row echelon 
form without affecting either rank. The resulting matrix M has the same row 
and column ranks as A. But M is a matrix with 1's followed by 0's on the main 
diagonal (entries Mı, 1, M22,...) and O's elsewhere. Hence, 


rrk(A) = rrk(M) = crk(M) = crk( A) 
as desired. O 
The Complexification of a Real Vector Space 


If W is a complex vector space (that is, a vector space over C), then we can 
think of W as a real vector space simply by restricting all scalars to the field R. 
Let us denote this real vector space by Wp and call it the real version of W. 


On the other hand, to each real vector space V, we can associate a complex 
vector space VČ. This “complexification” process will play a useful role when 
we discuss the structure of linear operators on a real vector space. (Throughout 
our discussion V will denote a real vector space.) 


Definition Jf V is a real vector space, then the set VE =V x V of ordered 
pairs, with componentwise addition 


(u,v) + (x,y) = (ut+a,vt+y) 
and scalar multiplication over C defined by 
(a + bi)(u, v) = (au — bu, av + bu) 


for a,b € R is a complex vector space, called the complexification of V.O 


It is convenient to introduce a notation for vectors in V© that resembles the 
notation for complex numbers. In particular, we denote (u,v) € VE by u + vi 
and so 


Vo ={utviluveV} 
Addition now looks like ordinary addition of complex numbers, 
(ut vi) + (x + yi) = (u + £) + (v + y)i 


and scalar multiplication looks like ordinary multiplication of complex numbers, 


54 Advanced Linear Algebra 


(a + bi)(u + vi) = (au — bv) + (av + bu)i 
Thus, for example, we immediately have for a,b € R, 
a(u + vi) = au + avi 
bi(u + vt) = —bv + bui 
(a + bi)u = au + bui 
(a + bi)vi = —bu + avi 


The real part of z = u + vi is u € V and the imaginary part of z isv E€ V. 
The essence of the fact that z = u + vi € V is really an ordered pair is that z is 
0 if and only if its real and imaginary parts are both 0. 
We can define the complexification map cpx: V — V by 

cpx(v) = v + 0i 


Let us refer to v + Oi as the complexification, or complex version of v € V. 
Note that this map is a group homomorphism, that is, 


cpx(0) =0+4+ 0% and cpx(u+v) = cpx(u) + cpx(v) 
and it is injective: 
cpx(u) = epx(v) & u = v 
Also, it preserves multiplication by real scalars: 
cpx(au) = au + 0i = a(u + 0i) = acpx(u) 
for a € R. However, the complexification map is not surjective, since it gives 


only “real” vectors in VC. 


The complexification map is an injective linear transformation (defined in the 
next chapter) from the real vector space V to the real version (V“)g of the 
complexification VČ, that is, to the complex vector space V© provided that 
scalars are restricted to real numbers. In this way, we see that V€ contains an 
embedded copy of V. 


The Dimension of V€ 


The vector-space dimensions of V and V€ are the same. This should not 
necessarily come as a surprise because although V€ may seem “bigger” than V, 
the field of scalars is also “bigger.” 


Theorem 1.17 If B={v;|j¢I} is a basis for V over R, then the 
complexification of 5, 


cpx(B) = {v; + 02 | v; € B} 


Vector Spaces 55 


is a basis for the vector space V€ over C. Hence, 
dim(V©) = dim(V) 


Proof. To see that cpx(B) spans VE over C, let x + iy € VE. Then z, y € V 
and so there exist real numbers a; and b; (some of which may be 0) for which 


Ew 


(aju; + bjvji) 


r+yi= a 


Ms HY 


&. 
ll 
gai 


I 
Ms- 


(aj + bji)(v; + 0i) 


v. 
ll 
pai 


To see that cpx(B) is linearly independent, if 
J 
sei a; + bjt)(vj + 0i) = 0 + 02 
j=l 


then the previous computations show that 


J J 
5 QjVj = 0 and 5 bjvj =0 
j=l j=l 


The independence of 6 then implies that a; = 0 and b; = 0 for all 7.0 


Ifv € V and B = {v; | i € I} isa basis for V, then we may write 


n 
v= y QiUi 
t=1 


for a; € R. Since the coefficients are real, we have 
n 
v+ 0i = $ aj(v; + 04) 
i=1 
and so the coordinate matrices are equal: 


[v + Di] opx(8) = [rls 


Exercises 


1. Let V bea vector space over F. Prove that 0v = 0 and r0 = 0 for all v € V 
and r € F. Describe the different O's in these equations. Prove that if 
ru = 0, then r = 0 or v = 0. Prove that rv = v implies that v = 0 or r = 1. 


56 


12. 


13. 


Advanced Linear Algebra 


Prove Theorem 1.3. 

a) Find an abelian group V and a field F for which V is a vector space 
over F in at least two different ways, that is, there are two different 
definitions of scalar multiplication making V a vector space over F. 

b) Find a vector space V over F and a subset S of V that is (1) a 
subspace of V and (2) a vector space using operations that differ from 
those of V. 

Suppose that V is a vector space with basis B = {b; | i € I} and S is a 

subspace of V. Let { B1, ... , Bp} be a partition of B. Then is it true that 


S= 


i 


What if S N (B;) A {0} for all i? 
Prove Theorem 1.8. 
Let S,T,U € S(V). Show that if U C S, then 


SN(T+U)=(SNT)+U 


(SN (Bi)) 


k 
=1 


This is called the modular law for the lattice S(V). 
For what vector spaces does the distributive law of subspaces 


Sn(T+U)=(SNT)+(SNU) 


hold? 

A vector v = (a1,...,@n) € R” is called strongly positive if a; > 0 for all 

a beeen 

a) Suppose that v is strongly positive. Show that any vector that is “close 
enough” to v is also strongly positive. (Formulate carefully what “close 
enough” should mean.) 

b) Prove that if a subspace S of R” contains a strongly positive vector, 
then S' has a basis of strongly positive vectors. 

Let M be an m x n matrix whose rows are linearly independent. Suppose 

that the k columns ¢;,,...,¢;, of M span the column space of M. Let C be 

the matrix obtained from M by deleting all columns except cj,,..., ¢; 

Show that the rows of C are also linearly independent. 


ke 


. Prove that the first two statements in Theorem 1.7 are equivalent. 
11. 


Show that if S is a subspace of a vector space V, then dim( S) < dim(V). 
Furthermore, if dim(S) = dim(V) < oo then S = V. Give an example to 
show that the finiteness is required in the second statement. 

Let dim(V) < oo and suppose that V = U @ Sı = U © S2. What can you 
say about the relationship between Sı and S2? What can you say if 
Sı C S2? 

What is the relationship between S @T and T'S? Is the direct sum 
operation commutative? Formulate and prove a similar statement 
concerning associativity. Is there an “identity” for direct sum? What about 
“negatives”? 


14. 


15. 


16. 
17. 


18. 


19. 


20. 


21. 


Vector Spaces 57 


Let V be a finite-dimensional vector space over an infinite field F. Prove 

that if S),..., 5; are subspaces of V of equal dimension, then there is a 

subspace T of V for which V = S; @ T for all i = 1,... , k. In other words, 

T is a common complement of the subspaces S;. 

Prove that the vector space C of all continuous functions from R to R is 

infinite-dimensional. 

Show that Theorem 1.2 need not hold if the base field F is finite. 

Let S be a subspace of V. The set v+ S = {v + s | s € S} is called an 

affine subspace of V. 

a) Under what conditions is an affine subspace of V a subspace of V? 

b) Show that any two affine subspaces of the form v + S and w+ S are 
either equal or disjoint. 

If V and W are vector spaces over F for which |V| = |W], then does it 

follow that dim(V) = dim(W)? 

Let V be an n-dimensional real vector space and suppose that S is a 

subspace of V with dim(S) = n — 1. Define an equivalence relation = on 

the set V \ S by v = w if the “line segment” 


L(v,w) = {rv+ (1 —-rjw|0<r<1} 


has the property that L(v,w) N S = Ø. Prove that = is an equivalence 

relation and that it has exactly two equivalence classes. 

Let F be a field. A subfield of F is a subset K of F that is a field in its 

own right using the same operations as defined on F. 

a) Show that F is a vector space over any subfield K of F. 

b) Suppose that F is an m-dimensional vector space over a subfield K of 
F. If V is an n-dimensional vector space over F, show that V is also a 
vector space over K. What is the dimension of V as a vector space 
over K? 

Let F be a finite field of size g and let V be an n-dimensional vector space 

over F. The purpose of this exercise is to show that the number of 

subspaces of V of dimension k is 


(2) z Ca N 

k/a (QF - 1) (q 1)" -1)--(q- 1) 

The expressions (} )q are called Gaussian coefficients and have properties 

similar to those of the binomial coefficients. Let S(n,k) be the number of 

k-dimensional subspaces of V. 

a) Let N(n,k) be the number of k-tuples of linearly independent vectors 
(v1,---, Uk) in V. Show that 


N(n,k) = (q" — 1)(q" — q) (a — ®t) 


b) Now, each of the k-tuples in a) can be obtained by first choosing a 
subspace of V of dimension k and then selecting the vectors from this 
subspace. Show that for any k-dimensional subspace of V, the number 


58 


22. 


23% 


24. 


25; 


26. 


Advanced Linear Algebra 


of k-tuples of independent vectors in this subspace is 
(a —1)(q° = q) la = a) 
c) Show that 
N (n,k) = $(n,k)(q* — 1)(4° — a) (a — a") 


How does this complete the proof? 
Prove that any subspace S' of R” is a closed set or, equivalently, that its set 
complement S° = R” \ S is open, that is, for any x € S° there is an open 
ball B(x, €) centered at x with radius € > 0 for which B(x, e€) C S°. 
Let B = {b1,..., bn} and C = {c1,...,¢,} be bases for a vector space V. 
Let 1 < m < n — 1. Show that there is a permutation o of {1,...,n} such 
that 


bi, ee bm, Co(m+1)1 +++ 1 Co(n) 
and 
Co(1); +++ 1 Ca(m)s bm+1, een bn 


are both bases for V. Hint: You may use the fact that if M is an invertible 
n x n matrix and if 1 < k < n, then it is possible to reorder the rows so 
that the upper left k x k submatrix and the lower right (n — k) x (n — k) 
submatrix are both invertible. (This follows, for example, from the general 
Laplace expansion theorem for determinants.) 

Let V be an n-dimensional vector space over an infinite field F and 
suppose that S1,..., Sp are subspaces of V with dim(S;) < m < n. Prove 
that there is a subspace T of V of dimension n—m for which 
T A S; = {0} for all i. 

What is the dimension of the complexification V© thought of as a real 
vector space? 

(When is a subspace of a complex vector space a complexification?) Let V 
be a real vector space with complexification V© and let U be a subspace of 
VC., Prove that there is a subspace S of V for which 


U = S? = {s+ ti | s,t € 9} 


if and only if U is closed under complex conjugation x: V — V© defined 
by x(u + iv) = u — iv. 


Chapter 2 
Linear Transformations 


Linear Transformations 


Loosely speaking, a linear transformation is a function from one vector space to 
another that preserves the vector space operations. Let us be more precise. 


Definition Let V and W be vector spaces over a field F. A function T: V — W 
is a linear transformation if 


T(ru + sv) = rr(u) + sT(v) 


for all scalars r,s € F and vectors uvEV. The set of all linear 

transformations from V to W is denoted by L(V , W). 

1) A linear transformation from V to V is called a linear operator on V. The 
set of all linear operators on V is denoted by L(V). A linear operator on a 
real vector space is called a real operator and a linear operator on a 
complex vector space is called a complex operator. 

2) A linear transformation from V to the base field F (thought of as a vector 
space over itself) is called a linear functional on V. The set of all linear 
functionals on V is denoted by V* and called the dual space of V.O 


We should mention that some authors use the term linear operator for any linear 
transformation from V to W. Also, the application of a linear transformation 7 
on a vector v is denoted by r(v) or by rv, parentheses being used when 
necessary, as in T(u + v), or to improve readability, as in (ru) rather than 


u(T(u)). 


Definition The following terms are also employed: 

1) homomorphism for linear transformation 

2) endomorphism for linear operator 

3) monomorphism (or embedding) for injective linear transformation 
4) epimorphism for surjective linear transformation 

5) isomorphism for bijective linear transformation. 


60 Advanced Linear Algebra 


6) automorphism for bijective linear operator. O 


Example 2.1 

1) The derivative D: V — V is a linear operator on the vector space V of all 
infinitely differentiable functions on R. 

2) The integral operator 7: F|x] — F[x] defined by 


Tf = "f(t)adt 
0 


is a linear operator on F'[z]. 

3) Let A be an m x n matrix over F. The function T4: F” — F™ defined by 
Tav = Av, where all vectors are written as column vectors, is a linear 
transformation from F” to F™. This function is just multiplication by A. 

4) The coordinate map ¢:V — F” of an n-dimensional vector space is a 
linear transformation from V to F”.0O 


The set L(V, W) is a vector space in its own right and L(V) has the structure of 
an algebra, as defined in Chapter 0. 


Theorem 2.1 

1) The set L(V,W) is a vector space under ordinary addition of functions 
and scalar multiplication of functions by elements of F. 

2) Ifo € L(U,V) andr € L(V, W), then the composition To is in L(U,W). 

3) Ift € L(V,W) is bijective then T! € L(W,V). 

4) The vector space L(V) is an algebra, where multiplication is composition 
of functions. The identity map t € L(V) is the multiplicative identity and 
the zero map 0 € L(V) is the additive identity. 

Proof. We prove only part 3). Let r:V — W be a bijective linear 

transformation. Then r~t: W — V is a well-defined function and since any two 

vectors wı and w in W have the form wı = Tv, and wy = Tv2, we have 


tT (aw + bw) = Tt (arv, + brus) 
= 7 !(r(av; + bv2)) 
= avı + bv 
= at (w1) + br (w2) 


which shows that 7~! is linear. O 


One of the easiest ways to define a linear transformation is to give its values on 
a basis. The following theorem says that we may assign these values arbitrarily 
and obtain a unique linear transformation by linear extension to the entire 
domain. 


Theorem 2.2 Let V and W be vector spaces and let B = {v;i |i € I} bea 
basis for V. Then we can define a linear transformation T E€ L(V,W) by 


Linear Transformations 61 


specifying the values of Tv; arbitrarily for all v; € B and extending T to V by 
linearity, that is, 


T(Q1U, +++ + anUn) = QTV, +++ + AnTUn 


This process defines a unique linear transformation, that is, if T,0 € L(V,W) 
satisfy TV; = ov; for all v; € B then T = 0. 

Proof. The crucial point is that the extension by linearity is well-defined, since 
each vector in V has an essentially unique representation as a linear 
combination of a finite number of vectors in B. We leave the details to the 
reader. O 


Note that if 7 € L(V, W) and if S is a subspace of V, then the restriction T|s of 
T to S is a linear transformation from S to W. 


The Kernel and Image of a Linear Transformation 


There are two very important vector spaces associated with a linear 
transformation T from V to W. 


Definition Let r € L(V, W). The subspace 
ker(T) = {v E V | rv = 0} 
is called the kernel of T and the subspace 
im(T) = {ruv|ve V} 
is called the image of T. The dimension of ker(r) is called the nullity of T and is 


denoted by null(r). The dimension of im(r) is called the rank of T and is 
denoted by rk(r).O 


It is routine to show that ker(7) is a subspace of V and im(r) is a subspace of 
W. Moreover, we have the following. 


Theorem 2.3 Lett € L(V,W). Then 

I) 7 is surjective if and only ifim(T) = W 

2) Tis injective if and only if ker(r) = {0} 

Proof. The first statement is merely a restatement of the definition of 
surjectivity. To see the validity of the second statement, observe that 


Tu = Tv & T(u — v) = 0 & u — v €E ker(r) 


Hence, if ker(T) = {0}, then ru = Tv & u = v, which shows that 7 is injective. 
Conversely, if 7 is injective and u € ker(r), then ru = 70 and so u = 0. This 
shows that ker(7) = {0}.0 


62 Advanced Linear Algebra 


Isomorphisms 


Definition A bijective linear transformation 7:V —W_ is called an 
isomorphism from V to W. When an isomorphism from V to W exists, we say 
that V and W are isomorphic and write V x W.O 


Example 2.2 Let dim(V) = n. For any ordered basis 6 of V, the coordinate 
map $g:V — F” that sends each vector v € V to its coordinate matrix 
[v]g € F” is an isomorphism. Hence, any n-dimensional vector space over F is 
isomorphic to F”.O 


Isomorphic vector spaces share many properties, as the next theorem shows. If 
7 EL(V,W) and S C V we write 


TS ={rs|s €S} 


Theorem 2.4 Let rT € L(V, W) be an isomorphism. Let S C V. Then 

1) S spans V if and only if TS spans W. 

2) S is linearly independent in V if and only if TS is linearly independent in 
W. 

3) Sisa basis for V if and only if TS is a basis for W.O 


An isomorphism can be characterized as a linear transformation T: V — W that 
maps a basis for V to a basis for W. 


Theorem 2.5 A linear transformation T E€ L(V,W) is an isomorphism if and 
only if there is a basis B for V for which TB is a basis for W. In this case, T 
maps any basis of V to a basis of W.0 


The following theorem says that, up to isomorphism, there is only one vector 
space of any given dimension over a given field. 


Theorem 2.6 Let V and W be vector spaces over F. Then V ~ W if and only 
ifdim(V) = dim(W).0 


In Example 2.2, we saw that any n-dimensional vector space is isomorphic to 
F”. Now suppose that B is a set of cardinality « and let (F? )o be the vector 


space of all functions from B to F with finite support. We leave it to the reader 
to show that the functions 5, € (F? )o defined for all b € B by 


1 ifg=b 
s(a) = l ifs #b 
form a basis for (F? )o, called the standard basis. Hence, dim((F?)o) = |B]. 


It follows that for any cardinal number «, there is a vector space of dimension «. 
Also, any vector space of dimension « is isomorphic to (F'?)o. 


Linear Transformations 63 


Theorem 2.7 If n is a natural number, then any n-dimensional vector space 
over F is isomorphic to F". If k is any cardinal number and if B is a set of 
cardinality «k, then any k-dimensional vector space over F is isomorphic to the 
vector space (F?®)o of all functions from B to F with finite support.O 


The Rank Plus Nullity Theorem 
Let r € L(V, W). Since any subspace of V has a complement, we can write 
V =ker(r) @ ker(r)° 
where ker(7)° is a complement of ker(r) in V. It follows that 
dim(V) = dim(ker(7)) + dim(ker(7)°) 
Now, the restriction of 7 to ker(7)°, 
T°: ker(7)° — W 
is injective, since 
ker(7°) = ker(T) Nker(r)* = {0} 


Also, im(r°) C im(r). For the reverse inclusion, if tv € im(7), then since 
v = u + w for u € ker(r) and w € ker(7)°, we have 


TU = TU + Tw = Tw = Tw E im(T*) 
Thus im(7°) = im(r). It follows that 
ker(r)° ~ im(7) 


From this, we deduce the following theorem. 


Theorem 2.8 Let r € L(V, W). 
1) Any complement of ker(T) is isomorphic to im(r) 
2) (The rank plus nullity theorem) 


dim(ker(r)) + dim(im(7)) = dim(V) 
or, in other notation, 
rk(7) + null(r) = dim(V) Oo 


Theorem 2.8 has an important corollary. 


Corollary 2.9 Let 7 E€ L(V,W), where dim(V) = dim(W) < oo. Then T is 
injective if and only if it is surjective. O 


Note that this result fails if the vector spaces are not finite-dimensional. The 
reader is encouraged to find an example to support this statement. 


64 Advanced Linear Algebra 


Linear Transformations from F” to F™ 
Recall that for any m x n matrix A over F the multiplication map 
Talv) = Av 


is a linear transformation. In fact, any linear transformation 7 € L(F”, F™) has 
this form, that is, 7 is just multiplication by a matrix, for we have 


(Te | Hi | TE Jes = (re, | nan | Ten) Ë = Te; 
and so T = T4, where 
A = (rei |---| Ten) 


Theorem 2.10 
1) If Aisanm x n matrix over F then T4 E€ L(F", F™®). 
2) Ifr e L(F"”, F™) thent = T4, where 


A = (re, | +-+- | Ten) 


The matrix A is called the matrix of 7.0 


Example 2.3 Consider the linear transformation 7: F? — F° defined by 
T(£,Y, Z) = (x — 2y,2,2+y+ 2) 


Then we have, in column form, 


x x—2y 1 -2 0 x 
Zz zr+y+z t ft T z 
and so the standard matrix of 7 is 
1 —2 0 
A=|0 0 1 O 
1 1 1 


If A € Mmn, then since the image of 74 is the column space of A, we have 
dim(ker(74)) + rk(A) = dim( F”) 


This gives the following useful result. 


Theorem 2.11 Let A be an m x n matrix over F. 
1) rta: F” > F" is injective if and only if rk(A) = n. 
2) Ta: F” — F™ is surjective if and only if rk(A) = m. Oo 


Linear Transformations 65 


Change of Basis Matrices 


Suppose that B = (bj,...,b,) and C = (ci,...,¢,) are ordered bases for a 
vector space V. It is natural to ask how the coordinate matrices [v]g and [v]e are 
related. Referring to Figure 2.1, 


YO 
PAN Y 
Fr 
Figure 2.1 


the map that takes [uv], to [v]e is dpc = dcop’ and is called the change of basis 
operator (or change of coordinates operator). Since dg is an operator on 
F”, it has the form 74, where 


A = (¢pc(e1) |---| $8,c(€n)) 
= (cog  ([bile) |---| 668" ([bnle)) 
= ([bile |- | [bnlc)) 


We denote A by Mg c and call it the change of basis matrix from B to C. 
Theorem 2.12 Let B = (b,,...,b,) and C be ordered bases for a vector space 


V. Then the change of basis operator ġg ce = bcd,’ is an automorphism of F”, 
whose standard matrix is 


Mge = (bile | a | [bn]e)) 
Hence 
[ule = Meg elv]g 


and Mcg = Mz O 


Consider the equation 
A = Mge 
or equivalently, 
A = ([bi]e | ++ | [bn]e)) 


Then given any two of A (an invertible n x n matrix), 6 (an ordered basis for 
F”) and C (an ordered basis for F”), the third component is uniquely 
determined by this equation. This is clear if B and C are given or if A and C are 


66 Advanced Linear Algebra 


given. If A and B are given, then there is a unique C for which A~' = Meg and 
so there is a unique C for which A = Mg. 


Theorem 2.13 Zf we are given any two of the following: 
1) an invertible n x n matrix A 

2) an ordered basis B for F” 

3) an ordered basis C for F". 

then the third is uniquely determined by the equation 


A= Mge o 


The Matrix of a Linear Transformation 


Let 7:V—W be a linear transformation, where dim(V)=n and 
dim(W) = m and let B = (b1,...,bn) be an ordered basis for V and C an 
ordered basis for W. Then the map 


0: [ule — [rc 


is a representation of T as a linear transformation from F” to F™, in the sense 
that knowing @ (along with B and C, of course) is equivalent to knowing 7. Of 
course, this representation depends on the choice of ordered bases 6 and C. 


Since @ is a linear transformation from F” to F™, it is just multiplication by an 
m x n matrix A, that is, 


[rule = Alv]z 
Indeed, since [b;]z = ei, we get the columns of A as follows: 
Al = Aei — Alvi] = [Tbilc 


Theorem 2.14 Let r € L(V,W) and let B = (bi,...,b,) and C be ordered 
bases for V and W, respectively. Then T can be represented with respect to B 
and C as matrix multiplication, that is, 


where 


[Tlac = ([rbile | --- | [rbn]e) 


is called the matrix of 7 with respect to the bases B and C. When V = W and 
B =C, we denote [|p by |T|g and so 


[rule = [r]s[v]s O 


Example 2.4 Let D: Pa — P2 be the derivative operator, defined on the vector 
space of all polynomials of degree at most 2. Let B = C = (1,2, x”). Then 


Linear Transformations 67 


0 1 0 
[D(1)]e = [Ole = | 0 |, [D(z)]e = [Te = | 0], [D(x ]e = [22] 2 
0 0 0 
and so 
0 1 0 
[De-|0 0 2 
0 0 0 


Hence, for example, if p(x) = 5 + x + 2x7, then 
0 1 0 ii 
[Dp(x)]e = [D]s [p(z)|s = |0 0 2||1|=]4 
0 0 0 0 


and so Dp(x) = 1 + 40.0 


The following result shows that we may work equally well with linear 
transformations or with the matrices that represent them (with respect to fixed 
ordered bases 6 and C). This applies not only to addition and scalar 
multiplication, but also to matrix multiplication. 


Theorem 2.15 Let V and W be finite-dimensional vector spaces over F, with 
ordered bases B = (bı, ... , bn) and C = (c1, ... , Cm), respectively. 
1) The map w: L(V,W) > Minn(£) defined by 


is an isomorphism and so L(V,W) % Mm,n(F). Hence, 
dim(L(V,W)) = dim(M,, n(F)) =m x n 


2) Ifo € LU,V) and tr € L(V,W) and if B, C and D are ordered bases for 
U, V and W, respectively, then 


[ro]s p = [Tle plo]ac 


Thus, the matrix of the product (composition) To is the product of the 
matrices of T and o. In fact, this is the primary motivation for the definition 
of matrix multiplication. 

Proof. To see that yu is linear, observe that for all 7, 


= [(so + tT) (bi)]e 

[sa(bi) + tr(bi)|c 

= slo(b)le + t[r(bi)lc 
slo]aclbile + t[t]a,c[bile 
(sløse + tlr lec) [bile 


[so + tT]g clb ilg 


| | 


68 Advanced Linear Algebra 


and since [b;]s = e; is a standard basis vector, we conclude that 
[so + tT|B.c = s[a|Bc + t|T]B.c 


and so u is linear. If A E My», we define r by the condition [rb;]e = AM, 
whence (7) = A and pu is surjective. Also, ker(js) = {0} since [r]g = 0 
implies that r = 0. Hence, the map u is an isomorphism. To prove part 2), we 
have 


[ro]e,p[|B = [T(ov) |p = [7] eplorlc = [T]er\e]scluls O 


Change of Bases for Linear Transformations 


Since the matrix [7],c that represents 7 depends on the ordered bases B and C, it 
is natural to wonder how to choose these bases in order to make this matrix as 
simple as possible. For instance, can we always choose the bases so that 7 is 
represented by a diagonal matrix? 


As we will see in Chapter 7, the answer to this question is no. In that chapter, 
we will take up the general question of how best to represent a linear operator 
by a matrix. For now, let us take the first step and describe the relationship 
between the matrices [7]g¢ and [r]g œ of 7 with respect to two different pairs 
(B,C) and (B’,C’) of ordered bases. Multiplication by [r]g œ sends [v]g to 
[rv]e. This can be reproduced by first switching from 6’ to B, then applying 
[r]s c and finally switching from C to C’, that is, 


[ree = Meelr]ecMp. = Mee[t|acMp py 
Theorem 2.16 Let T € L(V,W) and let (B,C) and (B',C') be pairs of ordered 
bases of V and W, respectively. Then 
[Tle = Mee [r]s c Msg B (2. HO 
When 7 € L(V) is a linear operator on V, it is generally more convenient to 
represent T by matrices of the form [7]g, where the ordered bases used to 


represent vectors in the domain and image are the same. When 6 = C, Theorem 
2.16 takes the following important form. 


Corollary 2.17 Let r € L(V) and let B and C be ordered bases for V. Then the 
matrix of T with respect to C can be expressed in terms of the matrix of T with 
respect to B as follows: 


[Tle = Mgclr]s Mge (2.2)0 


Equivalence of Matrices 


Since the change of basis matrices are precisely the invertible matrices, (2.1) has 
the form 


Linear Transformations 69 


[T]e.c = PlrlacQ™* 


where P and Q are invertible matrices. This motivates the following definition. 


Definition Two matrices A and B are equivalent if there exist invertible 
matrices P and Q for which 


B= PAQ" o 


We have remarked that B is equivalent to A if and only if B can be obtained 
from A by a series of elementary row and column operations. Performing the 
row operations is equivalent to multiplying the matrix A on the left by P and 
performing the column operations is equivalent to multiplying A on the right by 


Q. 


In terms of (2.1), we see that performing row operations (premultiplying by P) 
is equivalent to changing the basis used to represent vectors in the image and 
performing column operations (postmultiplying by Q7!) is equivalent to 
changing the basis used to represent vectors in the domain. 


According to Theorem 2.16, if A and B are matrices that represent 7 with 
respect to possibly different ordered bases, then A and B are equivalent. The 
converse of this also holds. 


Theorem 2.18 Let V and W be vector spaces with dim(V) =n and 
dim(W) = m. Then two m x n matrices A and B are equivalent if and only if 
they represent the same linear transformation T E€ L(V,W), but possibly with 
respect to different ordered bases. In this case, A and B represent exactly the 
same set of linear transformations in L(V ,W). 

Proof. If A and B represent 7, that is, if 


A= IT]B.c and B= [Tlec 


for ordered bases B,C, B’ and C’, then Theorem 2.16 shows that A and B are 
equivalent. Now suppose that A and B are equivalent, say 


B= PAQ" 


where P and Q are invertible. Suppose also that A represents a linear 
transformation T € L(V, W) for some ordered bases B and C, that is, 


A = [r]ge 


Theorem 2.9 implies that there is a unique ordered basis B’ for V for which 
Q = Mgs and a unique ordered basis C’ for W for which P = Mec. Hence 


B= Mee|T]gce Mg g = [T] e 


70 Advanced Linear Algebra 


Hence, B also represents 7. By symmetry, we see that A and B represent the 
same set of linear transformations. This completes the proof. 


We remarked in Example 0.3 that every matrix is equivalent to exactly one 
matrix of the block form 


Ik Ok.n—k 


J= 
Om—k,k Om—k,n—k block 


Hence, the set of these matrices is a set of canonical forms for equivalence. 
Moreover, the rank is a complete invariant for equivalence. In other words, two 
matrices are equivalent if and only if they have the same rank. 


Similarity of Matrices 


When a linear operator T € L(V) is represented by a matrix of the form [r]z, 
equation (2.2) has the form 


[Tle = P[r]sP™ 
where P is an invertible matrix. This motivates the following definition. 
Definition Two matrices A and B are similar, denoted by A ~ B, if there 
exists an invertible matrix P for which 
B = PAP“ 
The equivalence classes associated with similarity are called similarity 
classes. O 


The analog of Theorem 2.18 for square matrices is the following. 


Theorem 2.19 Let V be a vector space of dimension n. Then two nxn 
matrices A and B are similar if and only if they represent the same linear 
operator T € L(V), but possibly with respect to different ordered bases. In this 
case, A and B represent exactly the same set of linear operators in L(V). 
Proof. If A and B represent r € L(V), that is, if 


A= [Tle and B= [Tle 


for ordered bases B and C, then Corollary 2.17 shows that A and B are similar. 
Now suppose that A and B are similar, say 


B= PAP! 


Suppose also that A represents a linear operator r € L(V) for some ordered 
basis 5, that is, 


A= [rs 


Theorem 2.9 implies that there is a unique ordered basis C for V for which 


Linear Transformations 71 


P = Mge. Hence 
B= MepclT]aMg¢ = [Tle 


Hence, B also represents 7. By symmetry, we see that A and B represent the 
same set of linear operators. This completes the proof. O 


We will devote much effort in Chapter 7 to finding a canonical form for 
similarity. 


Similarity of Operators 


We can also define similarity of operators. 


Definition Two linear operators T,0 € L(V) are similar, denoted by T ~ Ø, if 
there exists an automorphism ¢ € L(V) for which 


o= oro} 


The equivalence classes associated with similarity are called similarity 
classes. O 


Note that if B = (b1, ... , bn) and C = (c1, ... , Cn) are ordered bases for V, then 
Meg = (fei]s | ++ | lends) 
Now, the map defined by ¢(b;) = c; is an automorphism of V and 
Mee = ([o(b1)\e |---| [b(n da) = [ole 


Conversely, if 6: V — V is an automorphism and B = (b1, ... , bn) is an ordered 
basis for V, then C = (cı = $(b1),..., Cn = b(bn)) is also a basis: 


[ele = (lġ(b1)]g |---| [e(On)]5) = Mew 


The analog of Theorem 2.19 for linear operators is the following. 


Theorem 2.20 Let V be a vector space of dimension n. Then two linear 
operators T and o on V are similar if and only if there is a matrix A E€ Mn that 
represents both operators, but with respect to possibly different ordered bases. 
In this case, T and o are represented by exactly the same set of matrices in Mp. 

Proof. If 7 and o are represented by A € Mn, that is, if 


[r]s = A= [ole 
for ordered bases B and C, then 
[ole = [r]g = Me glr]eMg,e 


As remarked above, if 6: V — V is defined by ¢(c;) = b;, then 


72 Advanced Linear Algebra 


[tlc = Mpc 
and so 
[ole = lelz"Ir]ele]e = [erele 


from which it follows that o and 7 are similar. Conversely, suppose that 7 and o 
are similar, say 


c= oro} 


where ¢ is an automorphism of V. Suppose also that 7 is represented by the 
matrix A E€ Ma, that is, 


A= [rl 
for some ordered basis B. Then [¢]g = Mcg and so 
[ols = [r "]n = [d]e[r]el¢le’ = Me sglr]s Meg 


It follows that 


A= [Tle = Mpgclo gMgg = [ole 


and so A also represents o. By symmetry, we see that 7 and o are represented 
by the same set of matrices. This completes the proof. 


We can summarize the sitiation with respect to similarity in Figure 2.2. Each 
similarity class S in L(V) corresponds to a similarity class 7 in M,,(F’): T is 
the set of all matrices that represent any 7 € S and S is the set of all operators 
in L(V) that are represented by any M € T. 


k ron | similarity classes 
T O; of L(V) 

A 

y 
[tls [ols Similarity classes 
r [tl [odd of matrices 


Figure 2.2 


Invariant Subspaces and Reducing Pairs 


The restriction of a linear operator r € L(V) to a subspace S of Vis not 
necessarily a linear operator on S. This prompts the following definition. 


Linear Transformations 73 


Definition Let T € L(V). A subspace S of V is said to be invariant under 7 or 
T-invariant if 7S C S, that is, if ts E€ S for all s € S. Put another way, S is 
invariant under T if the restriction T|g is a linear operator on S.O 


If 
V=S0T 


then the fact that S is T-invariant does not imply that the complement T is also 
T-invariant. (The reader may wish to supply a simple example with V = R?.) 


Definition Let rT € L(V). f V = S T and if both S and T are T-invariant, 
we say that the pair (S, T) reduces 7.0 


A reducing pair can be used to decompose a linear operator into a direct sum as 
follows. 


Definition Let 7 € L(V). If (S, T) reduces T we write 
T=T|s Orl|r 


and call 7 the direct sum of T|5 and T|r. Thus, the expression 


p=o@T 
means that there exist subspaces S and T of V for which (S,T) reduces p and 
o = p|s and T = p|r O 


The concept of the direct sum of linear operators will play a key role in the 
study of the structure of a linear operator. 


Projection Operators 


We will have several uses for a special type of linear operator that is related to 
direct sums. 


Definition Let V = S @T. The linear operator psr: V — V defined by 
ps.r(st+t)=s 
where s € S and t € T is called projection onto S along T.O 


Whenever we say that the operator pg r is a projection, it is with the 
understanding that V = S $ T. The following theorem describes a few basic 
properties of projection operators. We leave proof as an exercise. 


Theorem 2.21 Let V be a vector space and let p € L(V). 


74 Advanced Linear Algebra 


1) fV =S OT then 
PST + prs = & 
2) Ifp= ps; then 
im(g)=S and ker(p) =T 
and so 
V = im(p) ® ker(p) 
In other words, p is projection onto its image along its kernel. Moreover, 
veim(p) & pv=v 
3) Ifo € L(V) has the property that 
V =im(o) Gker(o) and limo) =t 

then o is projection onto im(c) along ker(o).O 

Projection operators are easy to characterize. 


Definition A linear operator T € L(V) is idempotent if? = 7.0 


Theorem 2.22 A linear operator p € L(V) is a projection if and only if it is 
idempotent. 
Proof. If p = psr, then for any s € Sandt E T, 

p(s +t) =ps=s=p(s+t) 


and so p? = p. Conversely, suppose that p is idempotent. If v € im(p) N ker(p), 
then v = px and so 


0 = pv = Px = pr =v 


Hence im(p) N ker(p) = {0}. Also, if v € V, then 
v = (v — pv) + pv € ker(p)  im(p) 


and so V = ker(p) $ im(p). Finally, p(px) = px = px and so plim(p) = t- 
Hence, p is projection onto im(p) along ker(p).0 


Projections and Invariance 


Projections can be used to characterize invariant subspaces. Let r € L(V) and 
let © be a subspace of V. Let p = psr for any complement T of S. The key is 
that the elements of S can be characterized as those vectors fixed by p, that is, 


Linear Transformations 75 


s € S ifand only if ps = s. Hence, the following are equivalent: 


TS CS 
rs E€ S foralls € S$ 
p(Ts) = Ts forall s € S 
p(tps) = Tps for all s € S 


Thus, S is T-invariant if and only if prp = Tp for all vectors s € S. But this is 
also true for all vectors in T’, since both sides are equal to 0 on T. This proves 
the following theorem. 


Theorem 2.23 Let r € L(V). Then a subspace S of V is T-invariant if and only 
if there is a projection p = ps,r for which 
pTp=Tp 


in which case this holds for all projections of the form p = ps,r.U 
We also have the following relationship between projections and reducing pairs. 


Theorem 2.24 Let V = S T. Then (S,T) reduces r € L(V) if and only if T 
commutes with pg r. 
Proof. Theorem 2.23 implies that S and T are T-invariant if and only if 


ps,tTps,r =psrt and (4 —ps,r)t(t— ps,r) = (t — psr)T 

and a little algebra shows that this is equivalent to 

PstTps.7 =psrt and ps rt =Tpsr 
which is equivalent to ps rt = Tps,r-O 
Orthogonal Projections and Resolutions of the Identity 
Observe that if p is a projection, then 

p- p)=(1-p)pe=0 
Definition Two projections p,o € L(V) are orthogonal, written p L o, if 
po =op=0 O 
Note that p L o if and only if 
im(p) Cker(o) and im(c) C ker(p) 


The following example shows that it is not enough to have po = 0 in the 
definition of orthogonality. In fact, it is possible for pø = 0 and yet op is not 
even a projection. 


76 Advanced Linear Algebra 


Example 2.5 Let V = F? and consider the X- and Y-axes and the diagonal: 
X = {(z,0)|a2¢€ F} 
Y={(0,y) |y €F} 

D={(a,«) |a € F} 


Then 


PD,XPDY = PDY É PD,X = PDY PD,X 


From this we deduce that if p and o are projections, it may happen that both 
products po and op are projections, but that they are not equal. We leave it to 
the reader to show that py xpx,p = 0 (which is a projection), but that px npy,x 
is not a projection. O 


Since a projection p is idempotent, we can write the identity operator ¿ as s sum 
of two orthogonal projections: 
p+(t—p)=4, pl(e—p) 


Let us generalize this to more than two projections. 


Definition 4 resolution of the identity on V is a sum of the form 
pı + eee + Pk =j 
where the p;'s are pairwise orthogonal projections, that is, p;i L p; for i A j.U 


There is a connection between the resolutions of the identity on V and direct 
sum decompositions of V. In general terms, if 


O1t:s::-t+op=He 
for any linear operators o; E€ L(V), then for all v € V, 
v=0;v +: + ov E im(o,) + +++ + im(o;,) 
and so 
V = im(o,) +--- + im(o;) 
However, the sum need not be direct. 
Theorem 2.25 Let V be a vector space. Resolutions of the identity on V 


correspond to direct sum decompositions of V as follows: 
1) If py +--+ + pp = is a resolution of the identity, then 


V = im(p1) © --- D im(px) 


Linear Transformations 77 


and p;i is projection onto im(p;) along 


ker(p;) = QD im(p;) 


i#i 
2) Conversely, if 
V=61 9 O Sk 


and if pi is projection onto S; along the direct sum @j4,5j,, then 
pı +: + pk = t is a resolution of the identity. 
Proof. To prove 1), if p1 + +- + pk = is a resolution of the identity, then 


V = im(p1) + +im(px) 
Moreover, if 
Pit, +: + prin = 0 


then applying p; gives p;x; = 0 and so the sum is direct. As to the kernel of p;, 
we have 


im(p;) @ ker(p;) = V = im(p,) @ | im(p,) 
i#i 


and since p;p; = 0, it follows that 


E im(p;) € ker(p;) 


i#i 
and so equality must hold. For part 2), suppose that 

V=61 9 Sk 
and p; is projection onto S; along @,,,5;. If i # j, then 

im(p:) = S; C ker(p;) 
and so p; L p;. Also, if v = s1 +--+: + sp for s; € Sj, then 
v= s1 te +55 = put + peu = (p t H po 

and so ¿ = pı +++- + px is a resolution of the identity. O 
The Algebra of Projections 


If p and o are projections, it does not necessarily follow that p + o, p — o or po 
is a projection. For example, the sum p + ø is a projection if and only if 


(p+o)=pto 


78 Advanced Linear Algebra 


which is equivalent to 
po =—op 


Of course, this holds if po = øp = 0, that is, if p L ø. But the converse is also 
true, provided that char(F’) 4 2. To see this, we simply evaluate pop in two 
ways: 


(po)p = —(op)p = —op 
and 
plap) = —p(pa) = —po 


Hence, op = po = —op and so op = 0. It follows that po = —op = 0 and so 
p Lo. Thus, for char(F’) 4 2, we have p +ø is a projection if and only if 
plo. 


Now suppose that p + a is a projection. For the kernel of p + ø, note that 
(ptojv=0 => p(ptov=0 > pv=0 


and similarly, ov = 0. Hence, ker(p + ø) C ker(p) N ker(o). But the reverse 
inclusion is obvious and so 


ker(p + o) = ker(p) N ker(c) 
As to the image of p + o, we have 
veim(pt+o) => v=(pto)v=puv+ov€ im(p) + im(o) 


and so im(p +o) C im(p) + im(c). For the reverse inclusion, if v = px + oy, 
then 


(p+ o)u=(p+o)(px + oy) = pt + oy =v 


and so v €im(p +0). Thus, im(p + o) =im(p)+im(c). Finally, po = 0 
implies that im(o) C ker(p) and so the sum is direct and 

im(p + o) = im(p) © im(o) 
The following theorem also describes the situation for the difference and 


product. Proof in these cases is left for the exercises. 


Theorem 2.26 Let V be a vector space over a field F of characteristic 4 2 and 
let p and o be projections. 
1) The sum p +0 is a projection if and only if p L o, in which case 


im(p +o) =im(p) @im(c) and ker(p+o) =ker(p) N ker(o) 
2) The difference p — o is a projection if and only if 


po =0p=0 


Linear Transformations 79 


in which case 
im(p — o) = im(p) N ker(o) and ker(p — c) = ker(p) $ im(c) 
3) Ifp and o commute, then po is a projection, in which case 
im(po) = im(p)Nim(o) and ker(po) = ker(p) + ker(c) 

(Example 2.5 shows that the converse may be false.)O 
Topological Vector Spaces 
This section is for readers with some familiarity with point-set topology. 
The Definition 


A pair (V, T) where V is a real vector space V and T is a topology on the set 
V is called a topological vector space if the operations of addition 


ÆVxV >V, A(v,w)=v+w 
and scalar multiplication 
M:RxV =V, M(r,v)=rv 
are continuous functions. 
The Standard Topology on R” 


The vector space R” is a topological vector space under the standard topology, 
which is the topology for which the set of open rectangles 


B= {h x- x In | J's are open intervals in R} 


is a base, that is, a subset of R” is open if and only if it is a union of open 
rectangles. The standard topology is also the topology induced by the Euclidean 
metric on R”, since an open rectangle is the union of Euclidean open balls and 
an open ball is the union of open rectangles. 
The standard topology on R” has the property that the addition function 

A:R” x R” > R”: (v, w) > v+ w 
and the scalar multiplication function 

M:R x R” > R”: (r, v) > rv 

are continuous and so R” is a topological vector space under this topology. 
Also, the linear functionals f: R” — R are continuous maps. 
For example, to see that addition is continuous, if 


(Uis: sUn) + eee te) € (a1, b1) Kr X (an, bn) EB 


80 Advanced Linear Algebra 


then u; + vi € (a;,b;) and so there is an € > 0 for which 
(ui — €, ui + €) + (vi — €, vi + €) C (ai, bi) 

for all ¿. It follows that if 

(U1,---;Un) E I := (u1 — €,u1 +€) X +++ X (Un — €, Un +e) E B 
and 

(Vi, ..., Un) E J = (v1 — €, 01 +€) X +++ X (Un — E, Un $6) E B 
then 

(u1,---,Un) + (U1,---;Un) E AU, J) C (a1, b1) X +++ X (an, bn) 


The Natural Topology on V 


Now let V be a real vector space of dimension n and fix an ordered basis 
B = (v1,..., Un) for V. We wish to show that there is precisely one topology T 
on V for which (V,7) is a topological vector space and all linear functionals 
are continuous. This topology is called the natural topology on V. 


Our plan is to show that if (V,7) is a topological vector space and if all linear 
functionals on V are continuous, then the coordinate map ¢g:V ~ Rn is a 
homeomorphism. This implies that if 7 does exist, it must be unique. Then we 
use w = ¢,' to move the standard topology from R” to V, thus giving V a 
topology 7 for which ¢g is a homeomorphism. Finally, we show that (V, 7) is 
a topological vector space and that all linear functionals on V are continuous. 


The first step is to show that if (V, T) is a topological vector space, then w is 
continuous. Since Y = Xy; where w;: R” — V is defined by 


Wilai,..- se) = Qi 


it is sufficient to show that these maps are continuous. (The sum of continuous 
maps is continuous.) Let O be an open set in 7. Then 


M!(0) ={(r,2z) ER x V | rz € O} 


is open in R x V. This implies that if ra € O, then there is an open interval 
I C R containing r for which 


Ix={sr|seI}co 
We need to show that the set Y7 +(O) is open. But 


y7 (0) = {(a1,...,4n) E€ R” | a;vi € OF 
=Rx-- xRx{a; ER | av; EO} xRx.--- xR 


In words, an n-tuple (a1, ..., an) is in Y7 +(O) if the ith coordinate a; times v; is 


Linear Transformations 81 


in O. But if a;v; € O, then there is an open interval J C R for which a; € J and 
Iv; C O. Hence, the entire open set 


U=Rx-:-xRxIxRx-:::xR 


where the factor T is in the ith position is in %7 1(O), that is, 

(Gius , Gn) EU = a CO) 
Thus, %71 (O) is open and y;, and therefore also y, is continuous. 
Next we show that if every linear functional on V is continuous under a 
topology 7 on V, then the coordinate map ¢ is continuous. If v € V denote by 
[v]g, i the ith coordinate of [v]g. The map u: V — R defined by pv = [v]g; is a 


linear functional and so is continuous by assumption. Hence, for any open 
interval J; € R the set 


A; = {vu E V | [v]g,i E I;} 
is open. Now, if J; are open intervals in R, then 
oh xx Ih) = {v EV | lope hx x mn} (Ai 


is open. Thus, ¢ is continuous. 


We have shown that if a topology T has the property that (V,T) is a 
topological vector space under which every linear functional is continuous, then 
¢ and y = @ | are homeomorphisms. This means that if T exists, its open sets 
must be the images under w of the open sets in the standard topology of R”. It 
remains to prove that the topology 7 on V that makes @ a homeomorphism 
makes (V, T) a topological vector space for which any linear functional f on V 
is continuous. 


The addition map on V is a composition 
A=¢'o0A'o($x 4) 
where A’: R” x R” — R” is addition in R” and since each of the maps on the 
right is continuous, so is A. 
Similarly, scalar multiplication in V is 
M = o M'o (1x $) 
where M':R x R” — R” is scalar multiplication in R”. Hence, M is 


continuous. 


Now let f be a linear functional. Since ¢ is continuous if and only if f o ¢™t is 
continuous, we can confine attention to V = R”. In this case, if e1, ... , en is the 
standard basis for R” and |f(e)| <M for all i, then for any 


82 Advanced Linear Algebra 


x = (a1,...,; an) E€ R”, we have 
|f(z)| = > afle) < XO laillf(ed| < MX Ja: 


Now, if |x| < e/Mn, then |a;| < €e/Mn and so |f (x)| < €, which implies that f 
is continuous at x = 0. 


According to the Riesz representation theorem (Theorem 9.18) and the Cauchy— 
Schwarz inequality, we have 


IEœ@ I SRM 


where Ry € R”. Hence, £n — 0 implies f (xn) — 0 and so by linearity, £n —> x 
implies f(,,) — x and so f is continuous at all x. 


Theorem 2.27 Let V be a real vector space of dimension n. There is a unique 
topology on V, called the natural topology, for which V is a topological vector 
space and for which all linear functionals on V are continuous. This topology is 
determined by the fact that the coordinate map $:V => R” is a 
homeomorphism, where R” has the standard topology induced by the Euclidean 
metric. O 


Linear Operators on V€ 


A linear operator 7 on a real vector space V can be extended to a linear operator 
TČ on the complexification V€ by defining 


TČ(u + vi) = r(u) + 7(v)é 
Here are the basic properties of this complexification of T. 


Theorem 2.28 Jf T,o € L(V), then 
I) (ar)? =ar, a cR 


Let us recall that for any ordered basis B for V and any vector v € V we have 
[v + Oilepx(g) = [olg 
Now, if B is an ordered basis for V, then the ith column of [7]g is 
[rbi]s = [rbi + Oilepx(g) = [7] (bi + 0i)lepx(8) 


which is the ith column of the coordinate matrix of r? with respect to the basis 
cpx(B). Thus we have the following theorem. 


Linear Transformations 83 


Theorem 2.29 Let T € L(V) where V is a real vector space. The matrix of T? 
with respect to the ordered basis cpx(B) is equal to the matrix of T with respect 
to the ordered basis B: 


[r lex) = [T]e 


Hence, if a real matrix A represents a linear operator T on V, then A also 
represents the complexification T? oft on V°.O 


Exercises 


1. 


10. 


11. 


Let A E Mm,n have rank k. Prove that there are matrices X € Mm, and 
Y € Mkn, both of rank k, for which A = XY. Prove that A has rank 1 if 
and only if it has the form A = x'y where x and y are row matrices. 

Prove Corollary 2.9 and find an example to show that the corollary does not 
hold without the finiteness condition. 

Let r € L(V,W). Prove that 7 is an isomorphism if and only if it carries a 
basis for V to a basis for W. 

If r€ L(Y, Wi) and a € L(V), W2) we define the external direct sum 
THo € L(Y E V, W, B W2) by 


(T Ho)((v4, v2)) = (Tv1, ov2) 


Show that 7 H ø is a linear transformation. 

Let V = S T. Prove that S § T ~ S HT. Thus, internal and external 
direct sums are equivalent up to isomorphism. 

Let V = A + B and consider the external direct sum E = AH B. Define a 
map 7: A Æ B —> V by T(v, w) = v + w. Show that 7 is linear. What is the 
kernel of 7? When is 7 an isomorphism? 

Let 7 € L(V) where dim(V) =n < œ. Let A E€ M,,(F'). Suppose that 
there is an isomorphism o: V ~ F” with the property that o(7v) = A(ov). 
Prove that there is an ordered basis B for which A = [r]z. 

Let T be a subset of L(V). A subspace S of V is T-invariant if S is T- 
invariant for every 7 € T. Also, V is 7 -irreducible if the only 7 -invariant 
subspaces of V are {0} and V. Prove the following form of Schur's lemma. 
Suppose that Ty C L(V) and Tw C L(W) and V is Ty-irreducible and W 
is Tjy-irreducible. Let a € L(V,W) satisfy aTy = Twa, that is, for any 
u € Ty there is a A € Ty such that au = Aq and for any A € Tw there is a 
u € Ty such that au = Aa. Prove that a = 0 or a is an isomorphism. 

Let rE L(V) where dim(V) <oo. If rk(7?)=rk(r) show that 
im(T) N ker(r) = {0}. 

Let r € L(U,V) ando € L(V, W). Show that 


tk(or) < min{rk(7), rk(c) } 


Let 7 € L(U, V) ando € L(V,W). Show that 
null(or) < null(r) + null(o) 


84 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22: 


23. 


Advanced Linear Algebra 


Let r,o € L(V) where 7 is invertible. Show that 
rk(ro) = rk(oT) = rk(o) 
Let T,0 € L(V, W). Show that 
rk(T +0) < rk(r) + rk(o) 


Let S be a subspace of V. Show that there is a r € L(V) for which 
ker(7) = S. Show also that there exists a o € L(V) for which im(c) = S. 
Suppose that 7,0 € L(V). 

a) Show that o = Tp for some u € L(V) if and only if im(c) C im(r). 

b) Show that o = ur for some u € L(V) if and only if ker(7) C ker(o). 
Let dim(V) < oo and suppose that 7r € L(V) satisfies 7? = 0. Show that 
2rk(T) < dim(V). 

Let A be an m x n matrix over F. What is the relationship between the 
linear transformation T4: F” — F™ and the system of equations AX = B? 
Use your knowledge of linear transformations to state and prove various 
results concerning the system AX = B, especially when B = 0. 

Let V have basis B = {v),..., Un} and assume that the base field F for V 
has characteristic 0. Suppose that for each 1 <i, j<n we define 
Tij E L(V) by 


for fki 
Ti,g(Uk) = k +v; ifk=i 


Prove that the 7; j are invertible and form a basis for L(V). 

Let r € L(V). If S is a T-invariant subspace of V must there be a subspace 
T of V for which (5,7) reduces 7? 

Find an example of a vector space V and a proper subspace S of V for 
which V = S. 

Let dim(V) < oo. If T, o € L(V) prove that or =z implies that 7 and o 
are invertible and that o = p(T) for some polynomial p(x) € F[z]. 

Let 7 € L(V). If ro = or for all o € L(V) show that T = av, for some 
a € F, where uz is the identity map. 

Let V be a vector space over a field F of characteristic 4 2 and let p and o 
be projections. Prove the following: 

a) The difference p — ø is a projection if and only if 


po =op =ð 
in which case 
im(p — o) = im(p) N ker(o) and ker(p — o) = ker(p) $ im(c) 


Hint: p is a projection if and only if ¿ — p is a projection and so p — o 
is a projection if and only if 


24. 


25. 
26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


Linear Transformations 85 


9=1-(p-0)=(v-p) +o 


is a projection. 
b) Ifpando commute, then po is a projection, in which case 


im(po) =im(p)Nim(c) and ker(po) = ker(p) + ker(o) 
Let f: R” — R be a continuous function with the property that 
fæ +y) = fla) + FY) 


Prove that f is a linear functional on R”. 

Prove that any linear functional f: R” — R is a continuous map. 

Prove that any subspace S of R” is a closed set or, equivalently, that 

S° = R” \ S is open, that is, for any x € S° there is an open ball B(x, €) 

centered at x with radius € > 0 for which B(x, €) C S°. 

Prove that any linear transformation T: V — W is continuous under the 

natural topologies of V and W. 

Prove that any surjective linear transformation 7 from V to W (both finite- 

dimensional topological vector spaces under the natural topology) is an 

open map, that is, 7 maps open sets to open sets. 

Prove that any subspace S of a finite-dimensional vector space V is a 

closed set or, equivalently, that S° is open, that is, for any x € S° there is 

an open ball B(x,e) centered at x with radius e >0 for which 

B(x,€) C S°. 

Let S be a subspace of V with dim(V) < co. 

a) Show that the subspace topology on S inherited from V is the natural 
topology. 

b) Show that the natural topology on V/S is the topology for which the 
natural projection map 7: V — V /S continuous and open. 

If V is a real vector space, then V© is a complex vector space. Thinking of 

V€ as a vector space (V“)g over R, show that (V“)g is isomorphic to the 

external direct product V HV. 

(When is a complex linear map a complexification?) Let V be a real vector 

space with complexification VČ and let ø € L(V“). Prove that ø is a 

complexification, that is, o has the form T° for some r € L(V) if and only 

if o commutes with the conjugate map y:V° —V© defined by 

x(u + iv) = u — iv. 

Let W be a complex vector space. 

a) Consider replacing the scalar multiplication on W by the operation 


(z, w) > Zw 


where z € C and w € W. Show that the resulting set with the addition 
defined for the vector space W and with this scalar multiplication is a 
complex vector space, which we denote by W. 

b) Show, without using dimension arguments, that (Wg)E ~ W AW. 


Chapter 3 
The Isomorphism Theorems 


Quotient Spaces 


Let S be a subspace of a vector space V. It is easy to see that the binary relation 
on V defined by 


u=zv & u-veEes 


is an equivalence relation. When u = v, we say that u and v are congruent 
modulo S. The term mod is used as a colloquialism for modulo and u = v is 
often written 


u = vmod S 


When the subspace in question is clear, we will simply write u = v. 


To see what the equivalence classes look like, observe that 
[v] = {uce V|u=v} 
={ueVl|u-—veS} 
= {u E€ V | u = v + s for some s € S} 
={v+s|se sS} 
=v4+S 
The set 
ju) =v+S={v+s|seS} 
is called a coset of S in V and v is called a coset representative for v + S. 
(Thus, any member of a coset is a coset representative.) 
The set of all cosets of S in V is denoted by 
V/S={v+S|veV} 


This is read “V mod S” and is called the quotient space of V modulo S. Of 


88 Advanced Linear Algebra 


course, the term space is a hint that we intend to define vector space operations 
on V/S. 


The natural choice for these vector space operations is 
(u+S)+(v+S)=(u+v)+S 
and 
r(u+S)=(ru)+ S 


but we must check that these operations are well-defined, that is, 


1) wm +S =u + S, v +S = v + S > (u, +v) +S = (u2 +v) +S 
2) u +S = uw: + S > ru +S = ru + S 


Equivalently, the equivalence relation = must be consistent with the vector 
space operations on V, that is, 


3) uy = Ua, v1 = v2 > (ur + v1) = (u2 + v2) 
4) u = ù > ru = ruz 


This senario is a recurring one in algebra. An equivalence relation on an 
algebraic structure, such as a group, ring, module or vector space is called a 
congruence relation if it preserves the algebraic operations. In the case of a 
vector space, these are conditions 3) and 4) above. 


These conditions follow easily from the fact that S is a subspace, for if uy = ug 
and vı = ve, then 


Uy — uz E S, v1 — v E S > r(uy — u2) + s(vi — vg) E S 
=> (ru; + sv) — (ruz + sv2) E€ S 
= TU, + SV = rU + SV 


which verifies both conditions at once. We leave it to the reader to verify that 
V/S is indeed a vector space over F under these well-defined operations. 


Actually, we are lucky here: For any subspace S of V, the quotient V/S is a 
vector space under the natural operations. In the case of groups, not all 
subgroups have this property. Indeed, it is precisely the normal subgroups N of 
G that have the property that the quotient G/N is a group. Also, for rings, it is 
precisely the ideals (not the subrings) that have the property that the quotient is 
a ring. 


Let us summarize. 


The Isomorphism Theorems 89 


Theorem 3.1 Let S be a subspace of V. The binary relation 
w=v & wu-ves 

is an equivalence relation on V, whose equivalence classes are the cosets 
v+S={v+s|seS} 

of S in V. The set V /S of all cosets of S in V, called the quotient space of V 


modulo S, is a vector space under the well-defined operations 
ru+ $)=ru+S 
(u+ S)+(v+S)=(utv)+S 


The zero vector in V /S is the coset 0 + S = S.O 
The Natural Projection and the Correspondence Theorem 


If S is a subspace of V, then we can define a map ms: V — V/S by sending 
each vector to the coset containing it: 


ts(v) =v+S 


This map is called the canonical projection or natural projection of V onto 
V/S, or simply projection modulo S. (Not to be confused with the projection 
operators ps,r.) It is easily seen to be linear, for we have (writing 7 for ms) 


n(ru + sv) = (ru + sv) + S = r(u + S) + slv + S) =rr(u) + sa(v) 


The canonical projection is clearly surjective. To determine the kernel of 7, note 
that 


v €ker(z) & rv) =OSv+S=SSves 


and so 
ker(t) = S 
Theorem 3.2 The canonical projection ng: V — V /S defined by 
ts(v) =v+S 


is a surjective linear transformation with ker(ns) = S.0 


If S is a subspace of V, then the subspaces of the quotient space V/S have the 
form T/S for some intermediate subspace T satisfying S C T C V. In fact, as 
shown in Figure 3.1, the projection map mg provides a one-to-one 
correspondence between intermediate subspaces S C T C V and subspaces of 
the quotient space V/S. The proof of the following theorem is left as an 
exercise. 


90 Advanced Linear Algebra 


Ve 
T | ed = 
se T/S 
{0}9 {0} 


Figure 3.1: The correspondence theorem 


Theorem 3.3 (The correspondence theorem) Let S be a subspace of V. Then 
the function that assigns to each intermediate subspace SCTCV the 
subspace T/S of V/S is an order-preserving (with respect to set inclusion) 
one-to-one correspondence between the set of all subspaces of V containing S 
and the set of all subspaces of V / S. 

Proof. We prove only that the correspondence is surjective. Let 


A={u+5|ueu} 
be a subspace of V /S and let T be the union of all cosets in X: 
T= U (u+ 5) 
ucU 


We show that S <T < V and that T/S = X. If x,y E€ T, then z+ S and 
y + S are in X and since X < V/S, we have 


re +8S,(z+y)+SEX 


which implies that rx, x +y € T. Hence, T is a subspace of V containing S. 
Moreover, if t+ S €T/S, then t€T and so t+ S$ € X. Conversely, if 
u+ S € X, then u € T and therefore u + S € T/S. Thus, X =7/S.0 


The Universal Property of Quotients and the First 
Isomorphism Theorem 
Let S be a subspace of V. The pair (V/S,ms) has a very special property, 


known as the universal property—a term that comes from the world of category 
theory. 


Figure 3.2 shows a linear transformation r€ £L(V,W), along with the 
canonical projection ms from V to the quotient space V / S. 


The Isomorphism Theorems 91 


v — >w 


Figure 3.2: The universal property 


The universal property states that if ker(r) D S, then there is a unique 
T: V/S — W for which 


Tong =T 
Another way to say this is that any such 7 € L(V, W) can be factored through 


the canonical projection 7g. 


Theorem 3.4 Let S be a subspace of V and let TE L(V,W) satisfy 
S Cker(r). Then, as pictured in Figure 3.2, there is a unique linear 
transformation T': V/S — W with the property that 


1 
T ONS =T 


Moreover, ker(T') = ker(r)/S and im(T') = im(r). 
Proof. We have no other choice but to define 7’ by the condition 7’ o ms =T, 
that is, 


T (v+ S) = Tv 
This function is well-defined if and only if 
v+ S =u+8S > T(v+S)=r(u+ sS) 
which is equivalent to each of the following statements: 


v+ S =u+S8S > T= rTu 
v-ueS>r(v—u)=0 
reEeS+>rTrx=0 
S C ker(r) 


Thus, 7’: V/S — W is well-defined. Also, 
im(r’) = {r (v+ S) |v € V} = {rv | v € V} = im(7) 


and 


92 Advanced Linear Algebra 


ker(r’) = {v + S | (v+ S) = 0} 


= {v+ S |v =0} 
= {v+ S | v € ker(r)} 
=ker(r)/S 


The uniqueness of 7’ is evident.O 


Theorem 3.4 has a very important corollary, which is often called the first 
isomorphism theorem and is obtained by taking S = ker(r). 


Theorem 3.5 (The first isomorphism theorem) Let r: V — W be a linear 

transformation. Then the linear transformation T': V /ker(T) — W defined by 
T'(v + ker(T)) = Tv 

is injective and 


V 
ker(7) 


= im(r) O 


According to Theorem 3.5, the image of any linear transformation on V is 
isomorphic to a quotient space of V. Conversely, any quotient space V/S of V 
is the image of a linear transformation on V: the canonical projection ms. Thus, 
up to isomorphism, quotient spaces are equivalent to homomorphic images. 


Quotient Spaces, Complements and Codimension 


The first isomorphism theorem gives some insight into the relationship between 
complements and quotient spaces. Let S be a subspace of V and let T be a 
complement of S, that is, 


V=SO6T 
Applying the first isomorphism theorem to the projection operator pr s: V —> T 
gives 

TeVvV/s 


Theorem 3.6 Let S be a subspace of V. All complements of S in V are 
isomorphic to V / S and hence to each other. 


The previous theorem can be rephrased by writing 


A®B=ASCHS>BRC 


On the other hand, quotients and complements do not behave as nicely with 
respect to isomorphisms as one might casually think. We leave it to the reader to 
show the following: 


The Isomorphism Theorems 93 


1) It is possible that 
A®B=CeD 


with A ~ C but B Æ D. Hence, A ~ C does not imply that a complement 
of A is isomorphic to a complement of C. 
2) Itis possible that V ~ W and 


V=S@BandW=S@6D 
but B Æ% D. Hence, V ~ W does not imply that V/S ~ W/S. (However, 
according to the previous theorem, if V equals W then B = D.) 
Corollary 3.7 Let S be a subspace of a vector space V. Then 
dim(V) = dim( S) + dim(V / S) O 
Definition Jf S is a subspace of V, then dim(V / S) is called the codimension of 
S in V and is denoted by codim(S) or codimy(S).0 
Thus, the codimension of S in V is the dimension of any complement of S in V 
and when V is finite-dimensional, we have 
codimy (S) = dim(V) — dim(S) 


(This makes no sense, in general, if V is not finite-dimensional, since infinite 
cardinal numbers cannot be subtracted.) 


Additional Isomorphism Theorems 


There are other isomorphism theorems that are direct consequences of the first 
isomorphism theorem. As we have seen, if V = S @T then V/T ~ S. This can 
be written 


sol... S 
T ~ SAT 
This applies to nondirect sums as well. 
Theorem 3.7 (The second isomorphism theorem) Let V be a vector space 
and let S and T be subspaces of V. Then 
S+T S 
T ` SAT 
Proof. Let 7: (S +T) — S/(S NT) be defined by 
T(s+t)=s+(SNT) 
We leave it to the reader to show that 7 is a well-defined surjective linear 


transformation, with kernel T. An application of the first isomorphism theorem 
then completes the proof. O 


94 Advanced Linear Algebra 


The following theorem demonstrates one way in which the expression V/S 
behaves like a fraction. 


Theorem 3.8 (The third isomorphism theorem) Let V be a vector space and 
suppose that S C T C V are subspaces of V. Then 
V/S V 


Tig P 


Proof. Let r: V/S — V/T be defined by T(v + S) = v + T. We leave it to the 
reader to show that 7 is a well-defined surjective linear transformation whose 
kernel is T/S. The rest follows from the first isomorphism theorem. O 


The following theorem demonstrates one way in which the expression V/S 
does not behave like a fraction. 


Theorem 3.9 Let V be a vector space and let S be a subspace of V. Suppose 
that V = Vi ® Və and S = Sı ® Sə with Si C V;. Then 
V Veh Vals 


S ADL A SS 
Proof. Let r: V — (V1 / S1) E (V2/ S2) be defined by 
T(V F U2) = (v + S1, v2 + S2) 


This map is well-defined, since the sum V = V, Ọ Və is direct. We leave it to 
the reader to show that 7 is a surjective linear transformation, whose kernel is 
S1 ® S2. The rest follows from the first isomorphism theorem. O 


Linear Functionals 


Linear transformations from V to the base field F (thought of as a vector space 
over itself) are extremely important. 


Definition Let V be a vector space over F. A linear transformation 
f € L(V, F) whose values lie in the base field F is called a linear functional 
(or simply functional) on V. (Some authors use the term linear function.) The 
vector space of all linear functionals on V is denoted by V* and is called the 
algebraic dual space of V.O 


The adjective algebraic is needed here, since there is another type of dual space 
that is defined on general normed vector spaces, where continuity of linear 
transformations makes sense. We will discuss the so-called continuous dual 
space briefly in Chapter 13. However, until then, the term “dual space” will 
refer to the algebraic dual space. 


The Isomorphism Theorems 95 


To help distinguish linear functionals from other types of linear transformations, 
we will usually denote linear functionals by lowercase italic letters, such as f, g 
and h. 


Example3.1 The map f:F [a] — F defined by f(p(x)) = p(0) is a linear 
functional, known as evaluation at 0.0 


Example 3.2 Let C[a, b] denote the vector space of all continuous functions on 
[a,b] C R. Let f: Cla, b] — R be defined by 


b 
f(a(2)) = f a(z) de 


a 


Then f € C[a, b]*.O 


For any f € V*, the rank plus nullity theorem is 
dim(ker(f)) + dim(im(f)) = dim(V) 


But since im( f) C F, we have either im( f) = {0}, in which case f is the zero 
linear functional, or im( f) = F, in which case f is surjective. In other words, a 
nonzero linear functional is surjective. Moreover, if f 4 0, then 


codim(ker(f)) = aim( o) = 1 


and if dim(V) < oo, then 
dim(ker(f)) = dim(V) — 1 


Thus, in dimensional terms, the kernel of a linear functional is a very “large” 
subspace of the domain V. 


The following theorem will prove very useful. 


Theorem 3.10 

I) For any nonzero vector v € V, there exists a linear functional f € V* for 
which f(v) 4 0. 

2) Avectorv € V is zero if and only if f (v) = 0 forall f EV". 

3) Let f € V*. If f(x) £0, then 


V = (x) ®ker( f) 


4) Two nonzero linear functionals f,g € V* have the same kernel if and only 
if there is a nonzero scalar A such that f = Ag. 

Proof. For part 3), if 0 # v € (x) Nker(f), then f(v) =0 and v= ax for 

0 #a c€ F, whence f(x) = 0, which is false. Hence, (x) Mker(f) = {0} and 

the direct sum S = (x) @ ker( f) exists. Also, for any v € V we have 


96 Advanced Linear Algebra 


v= Wea (v 2 


ee) E (x) + ker(f) 


and so V = (x) ®ker(f). 


For part 4), if f=Ag for \ #0, then ker(f) = ker(g). Conversely, if 
K = ker( f) = ker(g), then for x € K we have by part 3), 
V=(r4) 0K 


Of course, {|x = Ag|x for any À. Therefore, if A = f(x)/g(x), it follows that 
Ag(a) = f(x) and hence f = Ag.O 


Dual Bases 


Let V be a vector space with basis B = {v; | i € I}. For each i € I, we can 
define a linear functional v* € V“ by the orthogonality condition 


vi (vj) = bij 
where ô; j is the Kronecker delta function, defined by 
5 — ÎI ifi=j 
ot 10 ifi AG 
Then the set B* = {v; |i € I} is linearly independent, since applying the 
equation 


—a.vu* oe . ay* 
0= Qi, U;, +++ + Gi, Uj, 


to the basis vector v;, gives 
n n 
x 
0= > aiU; (Vi) = y 0j,6i;,4, = Qi; 
j=l j=l 


for all i,. 


Theorem 3.11 Let V be a vector space with basis B = {v; | i € I}. 

1) The set B* = {v; | i € I} is linearly independent. 

2) IfV is finite-dimensional, then B* is a basis for V*, called the dual basis of 
B. 

Proof. For part 2), for any f € V*, we have 


do 0i (1) = » F (vj) 61,5 = Fr) 


and so f = $- f(v,)v; is in the span of B*. Hence, B” isa basis for V*.O 


The Isomorphism Theorems 97 


It follows from the previous theorem that if dim(V) < 00, then 
dim(V*) = dim(V) 


since the dual vectors also form a basis for V*. Our goal now is to show that the 
converse of this also holds. But first, let us consider an example. 


Example 3.3 Let V be an infinite-dimensional vector space over the field 
F = Z, = {0,1}, with basis B. Since the only coefficients in F are 0 and 1, a 
finite linear combination over F is just a finite sum. Hence, V is the set of all 
finite sums of vectors in $ and so according to Theorem 0.12, 


IV] < |Po(B)| = |5| 


On the other hand, each linear functional f € V* is uniquely defined by 
specifying its values on the basis GB. Since these values must be either 0 or 1, 
specifying a linear functional is equivalent to specifying the subset of B on 
which f takes the value 1. In other words, there is a one-to-one correspondence 
between linear functionals on V and all subsets of B. Hence, 


|V"| = |P(B)| > |B| > |V| 


This shows that V* cannot be isomorphic to V, nor to any proper subset of V. 
Hence, dim(V*) > dim(V).0 


We wish to show that the behavior in the previous example is typical, in 
particular, that 

dim(V) < dim(V*) 
with equality if and only if V is finite-dimensional. The proof uses the concept 


of the prime subfield of a field K, which is defined as the smallest subfield of 
the field K. Since 0,1 € K, it follows that K contains a copy of the integers 


0,1,2=14+1,3=1+1+4+1,... 
If K has prime characteristic p, then p = 0 and so K contains the elements 
Ly = KO AA = 1} 
which form a subfield of K. Since any subfield F of K contains 0 and 1, we see 
that Z, C F and so Z, is the prime subfield of K. On the other hand, if K has 
characteristic 0, then K contains a “copy” of the integers Z and therefore also 
the rational numbers Q, which is the prime subfield of K. Our main interest in 
the prime subfield is that in either case, the prime subfield is countable. 
Theorem 3.12 Let V be a vector space. Then 
dim(V) < dim(V*) 


with equality if and only if V is finite-dimensional. 


98 Advanced Linear Algebra 


Proof. For any vector space V, we have 
dim(V) < dim(V*) 


since the dual vectors to a basis 5 for V are linearly independent in V*. We 
have already seen that if V is finite-dimensional, then dim(V) = dim(V*). We 
wish to show that if V is infinite-dimensional, then dim(V) < dim(V“*). (The 
author is indebted to Professor Richard Foote for suggesting this line of proof.) 


If B is a basis for V and if K is the base field for V, then Theorem 2.7 implies 
that 


V & (K®)o 
where (K®)g is the set of all functions with finite support from B to K and 
vr x KE 


where K” is the set of all functions from B to K. Thus, we can work with the 
vector spaces (K®)y and K®. 


The plan is to show that if F is a countable subfield of K and if G is infinite, 
then 


dimg ((K®)o) = dimp ((F”)o) < dimp(F®) < dimg (K¥) 
Since we may take F to be the prime subfield of K, this will prove the theorem. 
The first equality follows from the fact that the K-space (K®8)o and the F-space 


(F8)) each have a basis consisting of the “standard” linear functionals 
{fi | i € B} defined by 


fiv = 61,5 


for all v; € B, where ô; j is the Kronecker delta function. 


For the final inequality, suppose that { f;} C F” is linearly independent over F 


and that 
afi =0 


where a; € K. If {%;} is a basis for K over F, then a; = 5 aj jKj for ajj € F 


and so 
0= 5 aifi = 5 DD ai jks fi 
i i j 


Evaluating at any v € B gives 


The Isomorphism Theorems 99 


O= DD arhit = (Zosi j) 


and since the inner sums are in F and {x,;} is F-independent, the inner sums 


must be zero: 
5 ai i f;(v) = 0 


a 


Since this holds for all v € B, we have 
5 aij fi = 0 


which implies that a; ; = 0 for all i, j. Hence, {fi} is linearly independent over 
K. This proves that dimp (F®) < dimx (K5). 


For the center inequality, it is clear that 
dim; ((F®)o) < dim; (F”) 


We will show that the inequality must be strict by showing that the cardinality 
of (F®)o is |B| whereas the cardinality of F is greater than |B|. To this end, the 
set (F®)o can be partitioned into blocks based on the support of the function. In 
particular, for each finite subset S of B, if we let 


As = {f € (F®)o | supp(f) = S} 
then 


(F¥)o = |] As 


SCB 
S finite 


where the union is disjoint. Moreover, if |S] = n, then 
|As| < |F|” < Xo 
and so 


|(F?)o] = 52 |As] < |B] No = max(|B], Xo) = |B] 


SCB 
S finite 


But since the reverse inequality is easy to establish, we have 
B 
|(F")o| = IB] 
As to the cardinality of F”, for each subset T of B, there is a function fr € F”? 


that sends every element of T to 1 and every element of 6 \ T to 0. Clearly, 
each distinct subset T' gives rise to a distinct function fr and so Cantor's 


100 Advanced Linear Algebra 


theorem implies that 
|P°| > [2°] > Bl = [F° 
This shows that 
dimp((F°)o) < dimp(F®) 
and completes the proof.O 
Reflexivity 


If V is a vector space, then so is the dual space V* and so we may form the 
double (algebraic) dual space V**, which consists of all linear functionals 
o: V* — F. In other words, an element o of V™ is a linear functional that 
assigns a scalar to each linear functional on V. 


With this firmly in mind, there is one rather obvious way to obtain an element of 
V**. Namely, if v € V, consider the map v: V* — F defined by 


TF) = f(r) 


which sends the linear functional f to the scalar f(v). The map v is called 
evaluation at v. To see that v € V**, if f,g € V* anda,b € F, then 


Ulaf + bg) = (af + bg)(v) = af (v) + bg(v) = ad(f) + bv(g) 


and so vis indeed linear. 


We can now define a map T: V — V** by 
TU=U 
This is called the canonical map (or the natural map) from V to V**. This 


map is injective and hence in the finite-dimensional case, it is also surjective. 


Theorem 3.13 The canonical map t:V — V*™ defined by Tv = vU, where U is 
evaluation at v, is a monomorphism. If V is finite-dimensional, then T is an 
isomorphism. 

Proof. The map 7 is linear since 


au + bu(f) = f(au + bv) = af (u) + bf (v) = (at + b0)(f) 
for all f € V*. To determine the kernel of 7, observe that 


m=0>0=0 
=> v(f) = 0 forall f € V* 
=> f(v) =0 forall f € V* 
=> v=0 


by Theorem 3.10 and so ker(7) = {0}. In the finite-dimensional case, since 


The Isomorphism Theorems 101 


dim(V“*) = dim(V*) = dim(V) 
it follows that 7 is also surjective, hence an isomorphism. O 
Note that if dim(V ) < oo, then since the dimensions of V and V™ are the same, 
we deduce immediately that V ~ V**. This is not the point of Theorem 3.13. 
The point is that the natural map v — ¥ is an isomorphism. Because of this, V 
is said to be algebraically reflexive. Theorem 3.13 and Theorem 3.12 together 


imply that a vector space is algebraically reflexive if and only if it is finite- 
dimensional. 


If V is finite-dimensional, it is customary to identify the double dual space V** 
with V and to think of the elements of V** simply as vectors in V. Let us 
consider a specific example to show how algebraic reflexivity fails in the 
infinite-dimensional case. 
Example 3.4 Let V be the vector space over Z with basis 

ex = (0,...,0,1,0,...) 


where the 1 is in the kth position. Thus, V is the set of all infinite binary 
sequences with a finite number of 1's. Define the order o(v) of any v € V to be 
the largest coordinate of v with value 1. Then o(v) < œ for all v € V. 


Consider the dual vectors ež, defined (as usual) by 
ex (ej) = ôk.j 
For any v € V, the evaluation functional U has the property that 
B(e;,) = eg (v) = 0 if k > o(v) 


However, since the dual vectors e% are linearly independent, there is a linear 
functional f € V** for which 


fle) =1 


for all k > 1. Hence, f does not have the form v for any v € V. This shows that 
the canonical map is not surjective and so V is not algebraically reflexive. 


Annihilators 


The functions f € V* are defined on vectors in V, but we may also define f on 
subsets M of V by letting 


f(M) = {f(w) |v E M} 


102 Advanced Linear Algebra 


Definition Let M be a nonempty subset of a vector space V. The annihilator 
M? of M is 


M° = {f € V* | f(M) = {0}} o 


The term annihilator is quite descriptive, since M° consists of all linear 
functionals that annihilate (send to 0) every vector in M. It is not hard to see 
that M° is a subspace of V*, even when M is not a subspace of V. 


The basic properties of annihilators are contained in the following theorem. 
Theorem 3.14 
1) (Order-reversing) Jf M and N are nonempty subsets of V, then 
MCN=>N°CM® 
2) Ifdim(V) < œ, then for any nonempty subset M of V the natural map 
T: span(M) ~ M” 


is an isomorphism from span(M) onto M®. In particular, if S is a 
subspace of V, then S® = S. 
3) If S and T are subspaces of V, then 


(SOT) = S° +T? and (S +T)? = S? AT? 
Proof. We leave proof of part 1) for the reader. For part 2), since 
M% = (span(/))°° 


it is sufficient to prove that 7:5 ~ S% is an isomorphism, where S is a 
subspace of V. Now, we know that 7 is a monomorphism, so it remains to prove 
that rS = S$. If s € S, then rs = 5 has the property that for all f € 9°, 


a(f) = fs=0 
and so Ts = 5 € S™, which implies that rS C $°. Moreover, if 0 € S, then 
for all f € S° we have 

fv) =f) =0 


and so every linear functional that annihilates S also annihilates v. But if v ¢ S, 
then there is a linear functional g € V* for which g(S) = {0} and g(v) 4 0. 
(We leave proof of this as an exercise.) Hence, v € S and so U = Tv € TS and 
so S% C TS. 


For part 3), it is clear that f annihilates S + T if and only if f annihilates both 
S and T. Hence, (9+ 7T)° = S° N T°. Also, if f =g +h € S° +T? where 
g € Sl and h € T°, then g,h € (S N T)? and so f € (S A T)’. Thus, 


The Isomorphism Theorems 103 


Ser C (SAT)? 
For the reverse inclusion, suppose that f € (S N T')°. Write 
V=S'’@(SNT)OT GU 

where S = S'S (SOT) andT = (S NT) GT’. Define g € V* by 

gls =f, glsar = flsair=0, glr=0, glu =f 
and define h € V* by 

Aly =0, hļsar = fisar =0, hlr=f, hju =0 
It follows that g € T?, h € S° and g + h = f.O 
Annihilators and Direct Sums 
Consider a direct sum decomposition 

V=S5 9T 


Then any linear functional f € T* can be extended to a linear functional f on V 
by setting f(S) = 0. Let us call this extension by 0. Clearly, f € 9° and it is 
easy to see that the extension by 0 map f — f is an isomorphism from T* to 
S2, whose inverse is the restriction to T. 


Theorem 3.15 Let V = SOT. 
a) The extension by 0 map is an isomorphism from T* to S? and so 
T* x 5° 

b) If V is finite-dimensional, then 

dim(S°) = codimy (S) = dim(V) — dim(S) O 
Example 3.5 Part b) of Theorem 3.15 may fail in the infinite-dimensional case, 
since it may easily happen that S° ~ V*. As an example, let V be the vector 
space over Zə with a countably infinite ordered basis B = (e1, €2,...). Let 


S = (ei) and T = (e2,¢3,...). It is easy to see that S° ~ T* ~ V* and that 
dim(V*) > dim(V).0 


The annihilator provides a way to describe the dual space of a direct sum. 


Theorem 3.16 4 linear functional on the direct sum V = S ® T can be written 
as a sum of a linear functional that annihilates S and a linear functional that 
annihilates T, that is, 


(S a T)* = S! T’ 


104 Advanced Linear Algebra 


Proof. Clearly S° N T° = {0}, since any functional that annihilates both S and 
T must annihilate S $ T = V. Hence, the sum S° + T° is direct. The rest 
follows from Theorem 3.14, since 


V* = {0}? = (SAT)? = Ko as = S? QT? 


Alternatively, since pp + pg =v is the identity map, if f € V*, then we can 
write 


f= f o (pr + ps) = (F ° pr) + (F o ps) € 9 oT 
and so V* = S°@7°.0 
Operator Adjoints 
Ifr € L(V,W), then we may define a map T*: W* — V* by 
r*(f)=for= fi 
for f € W*. (We will write composition as juxtaposition.) Thus, for any v € V, 
[r*(F)](v) = F(re) 


The map 7” is called the operator adjoint of 7 and can be described by the 
phrase “apply 7 first.” 


Theorem 3.17 (Properties of the Operator Adjoint) 
I) Fort,c € L(V,W) and a,b €F, 


(ar + bo)* = ar” + bo” 

2) Foro € L(V,W) andr € L(W,U), 
(Ta) = o0*r* 
3) For any invertible r € L(V), 
(77)* = (7) 
Proof. Proof of part 1) is left for the reader. For part 2), we have for all f € U*, 
(ro)"(f) = (Ta) = o” (fr) = o” (7*(f)) = (0* TAF) 
Part 3) follows from part 2) and 
T*(T!)* = (T17) * =% = 


and in the same way, (7~!)*7* =u. Hence (77!)* = (r*) 1.0 


If r € L(V,W), then r* € L(W*, V*) and so 7** € L(V **, W**). Of course, 
T** is not equal to 7. However, in the finite-dimensional case, if we use the 


natural maps to identify V** with V and W** with W, then we can think of 7** 


The Isomorphism Theorems 105 


as being in £L(V,W). Using these identifications, we do have equality in the 
finite-dimensional case. 


Theorem 3.18 Let V and W be finite-dimensional and let T € L(V,W). If we 
identify V** with V and W** with W using the natural maps, then T** is 
identified with T. 

Proof. For any x € V let the corresponding element of V** be denoted by x and 
similarly for W. Then before making any identifications, we have for v € V, 


T™*(o)(f) = o[r* (f)] = (fr) = f(tv) = TH) 
for all f € W* and so 


T** (0) = TUE W™ 
Therefore, using the canonical identifications for both V** and W* we have 
T**(v) = Tv 


for all v € V.O 
The next result describes the kernel and image of the operator adjoint. 


Theorem 3.19 Let 7 € L(V, W). Then 
I) ker(T*) = im(r)° 

2) im(r*) = ker(r)° 

Proof. For part 1), 


ka = {f € W* | 7*(f) = 0} 
= {f € W* | f(TV) = {0}} 
= {f € W* | f(im(7)) = {0}} 
= im(r)° 


For part 2), if f=gr=7*%g€im(r*%), then ker(r)C ker(f) and so 
f €ker(r)°. 


For the reverse inclusion, let f € ker(r)? C V*. We wish to show that 
f = T*g = gr for some g € W*. On K = ker(r), there is no problem since f 
and T*g = gr agree on K for any g € W*. Let S be a complement of ker(r). 
Then 7 maps a basis B = {b; | i € I} for S to a linearly independent set 


TB = {rbi |i € I} 
in W and so we can define g € W* on TB by setting 
g(Tbi) = fb; 


and extending to all of W. Then f = gr = T* g on B and therefore on S. Thus, 
f = T*g € im(r*).0 


106 Advanced Linear Algebra 


Corollary 3.20 Let r€ L(V,W), where V and W are finite-dimensional. 
Then rk(T) = rk(r*).0 


In the finite-dimensional case, r and T% can both be represented by matrices. 
Let 
B= (b1,...,6,) and C = (c1,..., Cm) 
be ordered bases for V and W, respectively, and let 
B* = (bj,...,b,) and C* = (cf,...,¢,) 
be the corresponding dual bases. Then 
(I7]a.c)ig = ([rj]c)i = ef [rb] 
and 
(Ir Jere Jag = ([r* (65) a+ )a = 87°17" (ch) = 7" (Gj) (i) = c (rbi) 


Comparing the last two expressions we see that they are the same except that the 
roles of i and j are reversed. Hence, the matrices in question are transposes. 


Theorem 3.21 Let 7 € L(V,W), where V and W are finite-dimensional. If B 
and C are ordered bases for V and W, respectively, and B* and C* are the 
corresponding dual bases, then 


[r]es = (lls) 


In words, the matrices of T and its operator adjoint T* are transposes of one 
another. O 


Exercises 


1. If V is infinite-dimensional and S is an infinite-dimensional subspace, must 
the dimension of V / S be finite? Explain. 

Prove the correspondence theorem. 

Prove the first isomorphism theorem. 

Complete the proof of Theorem 3.9. 

Let S be a subspace of V. Starting with a basis {s1,...,5,} for S, how 
would you find a basis for V / S? 

6. Use the first isomorphism theorem to prove the rank-plus-nullity theorem 


rk(7) + null(7) = dim(V) 


Oy E 


for r € L(V, W) and dim(V) < co. 
7. Let rE L(V) and suppose that S is a subspace of V. Define a map 
T:V/S + V/S by 


11. 


12. 


13. 


14. 


15. 


16. 
I: 


18. 
19; 


20. 


The Isomorphism Theorems 107 


T(v+S)=Tr+sS 


When is 7’ well-defined? If 7’ is well-defined, is it a linear transformation? 
What are im(7’) and ker(r’)? 

Show that for any nonzero vector v € V, there exists a linear functional 
f € V* for which f(v) 4 0. 

Show that a vector v € V is zero if and only if f (v) = 0 for all f € V*. 


. Let S be a proper subspace of a finite-dimensional vector space V and let 


v€V\S. Show that there is a linear functional f € V* for which 
f(v) =1and f(s) = 0 forall s € S. 
Find a vector space V and decompositions 


V=A@GB=COED 


with A ~ C but B # D. Hence, A ~ C does not imply that A° = C°. 
Find isomorphic vectors spaces V and W with 


V=S@BandW=SOD 


but B Æ% D. Hence, V ~ W does not imply that V/S ~ W/S. 
Let V be a vector space with 


V=S8,0T7,=S5.0T) 


Prove that if Sı and Ss have finite codimension in V, then so does S1 N S2 
and 


codim( S1 N S2) < dim(T,) + dim(T2) 
Let V be a vector space with 
V=S8,0T, =5O6T) 


Suppose that Sı and S have finite codimension. Hence, by the previous 
exercise, so does Sı N S2. Find a direct sum decomposition V = W @ X 
for which (1) W has finite codimension, (2) W C S1 N S2 and (3) 
XDT%4+T. 

Let B be a basis for an infinite-dimensional vector space V and define, for 
all b € B, the map 0’ € V* by b'(c) = 1 if c = b and 0 otherwise, for all 
c € B. Does {b' | b € B} form a basis for V*? What do you conclude about 
the concept of a dual basis? 

Prove that if S and T are subspaces of V, then (S @T)* = S* B T*. 

Prove that 0* = 0 and ¿* = where 0 is the zero linear operator and u is 
the identity. 

Let S be a subspace of V. Prove that (V/S)* = S°. 

Verify that 

a) (t+0)* =7* +0% fort,0 € L(V, W). 

b) (rr)* = rr* for any r € F andr € L(V,W) 

Let rE L(V,W), where V and W are finite-dimensional. Prove that 
rk(r) =rk(r*). 


Chapter 4 
Modules I: Basic Properties 


Motivation 


Let V be a vector space over a field F and let r€ L(V). Then for any 
polynomial p(x) € F'[a], the operator p(T) is well-defined. For instance, if 
p(x) = 1+ 2z + 2°, then 


p(t) =L +2r +r 


where + is the identity operator and 7° is the threefold composition T 0 T 0 T. 


Thus, using the operator r we can define the product of a polynomial 
p(x) € F [a] and a vector v € V by 


p(x)v = p(T) (v) (4.1) 


This product satisfies the usual properties of scalar multiplication, namely, for 
all r(x), s(x) € F[z] andu,v eV, 


r(x)(u +v) =r(a)utr(a)v 
(r(x) + s(x))u = r(x)u + s(x)u 
[r(z)s(x)]u = r(x)[s(x)u] 


lu=u 


Thus, for a fixed 7 € L(V), we can think of V as being endowed with the 
operations of addition and multiplication of an element of V by a polynomial in 
Fx]. However, since F'{2] is not a field, these two operations do not make V 
into a vector space. Nevertheless, the situation in which the scalars form a ring 
but not a field is extremely important, not only in this context but in many 
others. 


Modules 


Definition Let R be a commutative ring with identity, whose elements are 
called scalars. An R-module (or a module over R) is a nonempty set M, 


110 Advanced Linear Algebra 


together with two operations. The first operation, called addition and denoted 
by +, assigns to each pair (u,v) E€ M x M, an element u +v € M. The 
second operation, denoted by juxtaposition, assigns to each pair 
(r,v) € Rx M, an element rv € M. Furthermore, the following properties 
must hold: 

1) M is an abelian group under addition. 

2) Forallr,s € Randu,v E€ M 


r(u +v) = ru + rv 
(r + su = ru + su 
(rs)u = r(su) 
lu=u 


The ring R is called the base ring of M.O 


Note that vector spaces are just special types of modules: a vector space is a 
module over a field. 


When we turn in a later chapter to the study of the structure of a linear 
transformation T € L(V), we will think of V as having the structure of a vector 
space over F as well as a module over F'[’] and we will use the notation V-. Put 
another way, V, is an abelian group under addition, with two scalar 
multiplications—one whose scalars are elements of F and one whose scalars are 
polynomials over F. This viewpoint will be of tremendous benefit for the study 
of 7. For now, we concentrate only on modules. 


Example 4.1 

1) If Ris a ring, the set R” of all ordered n-tuples whose components lie in R 
is an R-module, with addition and scalar multiplication defined 
componentwise (just as in F”), 


E A E E E O FY) 
and 
r(@,.--;@n) = (rai,...,T@n) 


for ai, bi, r € R. For example, Z” is the Z-module of all ordered n-tuples 
of integers. 

2) If R is a ring, the set Mm n(R) of all matrices of size m x n is an R- 
module, under the usual operations of matrix addition and scalar 
multiplication over R. Since R is a ring, we can also take the product of 
matrices in M,,,(R). One important example is R = Fa], whence 
Minn(£[2]) is the F'[a]-module of all m x n matrices whose entries are 
polynomials. 

3) Any commutative ring R with identity is a module over itself, that is, R is 
an R-module. In this case, scalar multiplication is just multiplication by 


Modules I: Basic Properties 111 


elements of R, that is, scalar multiplication is the ring multiplication. The 
defining properties of a ring imply that the defining properties of the R- 
module R are satisfied. We shall use this example many times in the 
sequel. 


Importance of the Base Ring 


Our definition of a module requires that the ring R of scalars be commutative. 
Modules over noncommutative rings can exhibit quite a bit more unusual 
behavior than modules over commutative rings. Indeed, as one would expect, 
the general behavior of R-modules improves as we impose more structure on 
the base ring R. If we impose the very strict structure of a field, the result is the 
very well behaved vector space. 


To illustrate, we will give an example of a module over a noncommutative ring 
that has a basis of size n for every integer n > 0! As another example, if the 
base ring is an integral domain, then whenever v),...,v, are linearly 
independent over R so are rv1,..., rUn for any nonzero r € R. This can fail 
when R is not an integral domain. 


We will also consider the property on the base ring R that all of its ideals are 
finitely generated. In this case, any finitely generated R-module M has the 
property that all of its submodules are also finitely generated. This property of 
R-modules fails if R does not have the stated property. 


When R is a principal ideal domain (such as Z or F'[2]), each of its ideals is 
generated by a single element. In this case, the R-modules are “reasonably” well 
behaved. For instance, in general, a module may have a basis and yet possess a 
submodule that has no basis. However, if R is a principal ideal domain, this 
cannot happen. 


Nevertheless, even when R is a principal ideal domain, R-modules are less well 
behaved than vector spaces. For example, there are modules over a principal 
ideal domain that do not have any linearly independent elements. Of course, 
such modules cannot have a basis. 


Submodules 


Many of the basic concepts that we defined for vector spaces can also be 
defined for modules, although their properties are often quite different. We 
begin with submodules. 


Definition 4 submodule of an R-module M is a nonempty subset S of M that 
is an R-module in its own right, under the operations obtained by restricting the 
operations of M to S. We write S < M to denote the fact that S is a submodule 
of M.O 


112 Advanced Linear Algebra 


Theorem 4.1 4 nonempty subset S of an R-module M is a submodule if and 
only if it is closed under the taking of linear combinations, that is, 


rnsE€RuveS>ru+sveS Oo 


Theorem 4.2 If S and T are submodules of M, then S OT and S +T are also 
submodules of M.O 


We have remarked that a commutative ring R with identity is a module over 
itself. As we will see, this type of module provides some good examples of non- 
vector-space-like behavior. 


When we think of a ring R as an R-module rather than as a ring, multiplication 
is treated as scalar multiplication. This has some important implications. In 
particular, if S' is a submodule of R, then it is closed under scalar multiplication, 
which means that it is closed under multiplication by all elements of the ring R. 
In other words, S is an ideal of the ring R. Conversely, if Z is an ideal of the 
ring R, then Z is also a submodule of the module R. Hence, the submodules of 
the R-module R are precisely the ideals of the ring R. 


Spanning Sets 
The concept of spanning set carries over to modules as well. 
Definition The submodule spanned (or generated) by a subset S of a module 
M is the set of all linear combinations of elements of S: 
(S)) = {r10 + +7 ntn | ri E€ Ryu; € Syn > 1} 
A subset S C M is said to span M or generate M if M = ((S)).0 
We use a double angle bracket notation for the submodule generated by a set 
because when we study the F-vector space/F'[x]-module V,, we will need to 


make a distinction between the subspace (v) = Fv generated by v € V and the 
submodule ((v)) = F'[a}u generated by v. 


One very important point to note is that if a nontrivial linear combination of the 
elements v1, ... , Un in an R-module M is 0, 
T1U1 +++ E Tnn =0 


where not all of the coefficients are 0, then we cannot conclude, as we could in 
a vector space, that one of the elements v; is a linear combination of the others. 
After all, this involves dividing by one of the coefficients, which may not be 
possible in a ring. For instance, for the Z-module Z x Z we have 


2(3,6) — 3(2, 4) = (0,0) 


but neither (3, 6) nor (2, 4) is an integer multiple of the other. 


? 


Modules I: Basic Properties 113 


The following simple submodules play a special role in the theory. 


Definition Let M be an R-module. A submodule of the form 
(vý = Rv = {rv| re R} 
forv € M is called the cyclic submodule generated by v. O 


Of course, any finite-dimensional vector space is the direct sum of cyclic 
submodules, that is, one-dimensional subspaces. One of our main goals is to 
show that a finitely generated module over a principal ideal domain has this 
property as well. 


Definition An R-module M is said to be finitely generated if it contains a 
finite set that generates M. More specifically, M is n-generated if it has a 
generating set of size n (although it may have a smaller generating set as 
well).O 


Of course, a vector space is finitely generated if and only if it has a finite basis, 
that is, if and only if it is finite-dimensional. For modules, life is more 
complicated. The following is an example of a finitely generated module that 
has a submodule that is not finitely generated. 


Example 4.2 Let R be the ring F'[21,x2,...] of all polynomials in infinitely 
many variables over a field F. It will be convenient to use X to denote 
X1,@2,... and write a polynomial in R in the form p(X). (Each polynomial in 
R, being a finite sum, involves only finitely many variables, however.) Then R 
is an R-module and as such, is finitely generated by the identity element 
p(X) =1. 


Now consider the submodule S of all polynomials with zero constant term. This 
module is generated by the variables themselves, 


= (x1, 29, ---)) 


However, S is not finitely generated. To see this, suppose that G = {p),..., pr} 
is a finite generating set for S. Choose a variable x, that does not appear in any 
of the polynomials in G. Then no linear combination of the polynomials in G 
can be equal to £x. For if 


114 Advanced Linear Algebra 


then let a;(X) = x,q;(X) + r;(X) where r;(X) does not involve xg. This gives 


n 


Tk = X [erg (X) +ri(X)lp:(X) 


i=l 
= r> 9:(X)pi(X) + dori X)pi(X) 

i=l i=1 
The last sum does not involve x; and so it must equal 0. Hence, the first sum 
must equal 1, which is not possible since p;(X) has no constant term. O 


Linear Independence 
The concept of linear independence also carries over to modules. 
Definition A subset S of an R-module M is linearly independent if for any 
distinct V1,...,Un E€ S and r1,...,T% E R, we have 
riv bet + Tnn = 0 > r; = Ofor alli 


A set S that is not linearly independent is linearly dependent. O 


It is clear from the definition that any subset of a linearly independent set is 
linearly independent. 


Recall that in a vector space, a set S of vectors is linearly dependent if and only 
if some vector in S is a linear combination of the other vectors in S. For 
arbitrary modules, this is not true. 


Example 4.3 Consider Z as a Z-module. The elements 2,3 € Z are linearly 
dependent, since 


3(2) — 2(3) =0 


but neither one is a linear combination (i.e., integer multiple) of the other.O 


The problem in the previous example (as noted earlier) is that 
T11 +e Tnn = 0 
implies that 
T101 = — TU — +++ — TpUn 


but in general, we cannot divide both sides by rı, since it may not have a 
multiplicative inverse in the ring R. 


Modules I: Basic Properties 115 


Torsion Elements 


In a vector space V over a field F, singleton sets {v} where v Æ 0 are linearly 
independent. Put another way, r #0 and v Æ 0 imply rv 4 0. However, in a 
module, this need not be the case. 


Example 4.4 The abelian group Z„ = {0,1,...,n—1} is a Zmodule, with 
scalar multiplication defined by za = (z- a) mod n, for all z € Z anda € Zp. 
However, since na=0 for all a € Zn, no singleton set {a} is linearly 
independent. Indeed, Zn has no linearly independent sets. O 


This example motivates the following definition. 


Definition Let M be an R-module. A nonzero element v € M for which rv = 0 
for some nonzero r € R is called a torsion element of M. A module that has no 
nonzero torsion elements is said to be torsion-free. If all elements of M are 
torsion elements, then M is a torsion module. The set of all torsion elements of 
M, together with the zero element, is denoted by Mio. O 


If M is a module over an integral domain, it is not hard to see that Mio; is a 
submodule of M and that M /Mior is torsion-free. (We will define quotient 
modules shortly: they are defined in the same way as for vector spaces.) 


Annihilators 


Closely associated with the notion of a torsion element is that of an annihilator. 


Definition Let M be an R-module. The annihilator of an element v € M is 
ann(v) = {r € R | rv = 0} 
and the annihilator of a submodule N of M is 
ann(N)= {re R|rN = {o}} 
where rN = {rv | v € N}. Annihilators are also called order ideals. O 
It is easy to see that ann(v) and ann(JV) are ideals of R. Clearly, v € M is a 


torsion element if and only if ann(v) # {0}. Also, if A and B are submodules of 
M, then 


A<B = ann(B) < ann(A) 


(note the reversal of order). 


Let M = ((u1,...,Un)) be a finitely generated module over an integral domain 
R and assume that each of the generators u; is torsion, that is, for each 7, there is 
a nonzero a; € ann(u;). Then, the nonzero product a = a1- -an annihilates each 
generator of M and therefore every element of M, that is, a € ann( M). This 


116 Advanced Linear Algebra 


shows that ann( M) # {0}. On the other hand, this may fail if R is not an 
integral domain. Also, there are torsion modules whose annihilators are trivial. 
(We leave verification of these statements as an exercise.) 


Free Modules 


The definition of a basis for a module parallels that of a basis for a vector space. 


Definition Let M be an R-module. A subset B of M is a basis if B is linearly 
independent and spans M. An R-module M is said to be free if M = {0} or if 
M has a basis. If B is a basis for M, we say that M is free on 6.0 


We have the following analog of part of Theorem 1.7. 


Theorem 4.3 4 subset B of a module M is a basis if and only if every nonzero 
v € M is an essentially unique linear combination of the vectors in B.U 


In a vector space, a set of vectors is a basis if and only if it is a minimal 
spanning set, or equivalently, a maximal linearly independent set. For modules, 
the following is the best we can do in general. We leave proof to the reader. 


Theorem 4.4 Let B be a basis for an R-module M. Then 
1) Bis a minimal spanning set. 
2) Bis a maximal linearly independent set. O 


The Z-module Z,, has no basis since it has no linearly independent sets. But 
since the entire module is a spanning set, we deduce that a minimal spanning set 
need not be a basis. In the exercises, the reader is asked to give an example of a 
module M that has a finite basis, but with the property that not every spanning 
set in M contains a basis and not every linearly independent set in M is 
contained in a basis. It follows in this case that a maximal linearly independent 
set need not be a basis. 


The next example shows that even free modules are not very much like vector 
spaces. It is an example of a free module that has a submodule that is not free. 


Example 4.5 The set Z x Z is a free module over itself, using componentwise 
scalar multiplication 


(n,m)(a, b) = (na, mb) 


with basis {(1,1)}. But the submodule Z x {0} is not free since it has no 
linearly independent elements and hence no basis. O 


Theorem 2.2 says that a linear transformation can be defined by specifying its 
values arbitrarily on a basis. The same is true for free modules. 


Modules I: Basic Properties 117 


Theorem 4.5 Let M and N be R-modules where M is free with basis 
B = {bi | i € I}. Then we can define a unique R-map T: M — N by specifying 
the values of Tb; arbitrarily for all b; € B and then extending T to M by 
linearity, that is, 


T(ayvy +-+ + anUn) = ATV +++) + An TUn o 


Homomorphisms 
The term linear transformation is special to vector spaces. However, the 
concept applies to most algebraic structures. 
Definition Let M and N be R-modules. A function t:M— N is an R- 
homomorphism or R-map if it preserves the module operations, that is, 

T(ru + sv) = rr(u) + sT(v) 
forallr,s € Rand u,v € M. The set of all R-homomorphisms from M to N is 
denoted by homr(M, N). The following terms are also employed: 
1) An R-endomorphism is an R-homomorphism from M to itself. 
2) An R-monomorphism or R-embedding is an injective R-homomorphism. 


3) An R-epimorphism is a surjective R-homomorphism. 
4) An R-isomorphism is a bijective R-homomorphism.O 


It is easy to see that homg( M,N) is itself an R-module under addition of 
functions and scalar multiplication defined by 
(rr)(v) = r(rv) = r(rv) 


Theorem 4.6 Let 7 € home(M, N). The kernel and image of r, defined as for 
linear transformations by 


ker(r) = {v € M |Tv = 0} 

and 
im(r) = {rv |v E€ M} 

are submodules of M and N, respectively. Moreover, T is a monomorphism if 
and only if ker(r) = {0}.0 
If N is a submodule of the R-module M, then the map j: N — M defined by 
j(v) = vis evidently an R-monomorphism, called injection of N into M. 
Quotient Modules 


The procedure for defining quotient modules is the same as that for defining 
quotient vector spaces. We summarize in the following theorem. 


118 Advanced Linear Algebra 


Theorem 4.7 Let S be a submodule of an R-module M. The binary relation 
Uu=veu-ves 
is an equivalence relation on M, whose equivalence classes are the cosets 
v+S={v+s|seS} 


of S in M. The set M/S ofall cosets of S in M, called the quotient module of 
M modulo S, is an R-module under the well-defined operations 


(u+S)+(v+5)=(utv) +S 
r(u+ S)=rut+S 
The zero element in M/S is the coset0+S=S.0 


One question that immediately comes to mind is whether a quotient module of a 
free module must be free. As the next example shows, the answer is no. 


Example 4.6 As a module over itself, Z is free on the set {1}. For any n > 0, 
the set Zn = {zn | z € Z} is a free cyclic submodule of Z, but the quotient Z- 
module Z/Zn is isomorphic to Z,, via the map 


T(u+ Zn) = umodn 
and since Z, is not free as a Z-module, neither is Z/Zn.O 
The Correspondence and Isomorphism Theorems 
The correspondence and isomorphism theorems for vector spaces have analogs 


for modules. 


Theorem 4.8 (The correspondence theorem) Let S be a submodule of M. 
Then the function that assigns to each intermediate submodule S CT C M the 
quotient submodule T/S of M/S is an order-preserving (with respect to set 
inclusion) one-to-one correspondence between submodules of M containing S 
and all submodules of M/S.O 


Theorem 4.9 (The first isomorphism theorem) Let 7: — N be an R- 
homomorphism. Then the map T": M /ker(r) — N defined by 


T'(v + ker(T)) = Tv 


is an R-embedding and so 


Modules I: Basic Properties 119 


Theorem 4.10 (The second isomorphism theorem) Let M be an R-module 
and let S and T be submodules of M. Then 
SERTE S 
T Snr 
Theorem 4.11 (The third isomorphism theorem) Let M be an R-module and 
suppose that S C T are submodules of M. Then 
M/S M 
T/S T 


O 


O 


Direct Sums and Direct Summands 


The definition of direct sum of a family of submodules is a direct analog of the 
definition for vector spaces. 


Definition The external direct sum of R-modules M,,..., Mn, denoted by 
M= M, m- B M,a 


is the r-module whose elements are ordered n-tuples 
M = { (v1, .-. Un) | v; € Mii =1,... ,n} 
with componentwise operations 
(Uir... Un) + (V1; Un) = (U1 + U1,... 5 Un + Un) 
and 
PUis gUn) = (TUT Un) 
forre RO 
We leave it to the reader to formulate the definition of external direct sums and 


products for arbitrary families of modules, in direct analogy with the case of 
vector spaces. 


Definition An R-module M is the (internal) direct sum of a family 
F = {S; | i € I} of submodules of M, written 


M=QF or M=Qs, 
i€l 
if the following hold: 
1) (Join of the family) M is the sum (join) of the family F: 


V=% 5; 


iel 


120 Advanced Linear Algebra 


2) (Independence of the family) For each i € I, 


Sn [Xs] =10} 


jži 


In this case, each S; is called a direct summand of M. If F = { 91, ..., Sn} is 
a finite family, the direct sum is often written 


M=81 9 P Shn 


Finally, if M = S T, then S is said to be complemented and T is called a 
complement of S in M.O 


As with vector spaces, we have the following useful characterization of direct 
sums. 


Theorem 4.12 Let F = {S; | i € I} be a family of distinct submodules of an R- 
module M. The following are equivalent: 
1) (Independence of the family) For each i € I, 


Ssn | 525; | = {0} 


jži 


2) (Uniqueness of expression for 0) The zero element 0 cannot be written as 
a sum of nonzero elements from distinct submodules in F. 

3) (Uniqueness of expression) Every nonzero v € M has a unique, except for 
order of terms, expression as a sum 


V= Site + Sy 


of nonzero elements from distinct submodules in F. 


Hence, a sum 
M= S; 


iel 


is direct if and only if any one of 1)—3) holds.O 


In the case of vector spaces, every subspace is a direct summand, that is, every 
subspace has a complement. However, as the next example shows, this is not 
true for modules. 


Example 4.7 The set Z of integers is a Z-module. Since the submodules of Z 
are precisely the ideals of the ring Z and since Z is a principal ideal domain, the 
submodules of Z are the sets 


(n) = Zn = {zn | z € Z} 


Modules I: Basic Properties 121 


Hence, any two nonzero proper submodules of Z have nonzero intersection, for 
ifn Æ m > 0, then 


Zn N Zm = Zk 


where k = lcm{n, m}. It follows that the only complemented submodules of Z 
are Z and {0}.0 


In the case of vector spaces, there is an intimate connection between subspaces 
and quotient spaces, as we saw in Theorem 3.6. The problem we face in 
generalizing this to modules is that not all submodules are complemented. 
However, this is the only problem. 


Theorem 4.13 Let S be a complemented submodule of M. All complements of 
S are isomorphic to M / S and hence to each other. 

Proof. For any complement T of S, the first isomorphism theorem applied to 
the projection pr, s: M —> T gives T ~ M/S.0 


Direct Summands and Extensions of Isomorphisms 


Direct summands play a role in questions relating to whether certain module 
homomorphisms o: N — M; can be extended from a submodule N < M to the 
full module M. The discussion will be a bit simpler if we restrict attention to 
epimorphisms. 


IfM = N @H, then a module epimorphism o: N — M; can be extended to an 
epimorphism F: M — Mı simply by sending the elements of H to zero, that is, 
by setting 


a(n+h)=on 
This is easily seen to be an R-map with 
ker(@) = ker(o) $ H 


Moreover, if 7 is another extension of g with the same kernel as a, then 7 and F 
agree on H as well as on N, whence t = o. Thus, there is a unique extension of 
o with kernel ker(c) $ H. 


Now suppose that o: N ~ Mi is an isomorphism. If N is complemented, that is, 
if 
G=NOH 


then we have seen that there is a unique extension F of o for which ker(7) = H. 
Thus, the correspondence 


Ho, where ker(o) = H 


from complements of N to extensions of o is an injection. To see that this 
correspondence is a bijection, if 7: W — M; is an extension of g, then 


122 Advanced Linear Algebra 


M = N @ker(a) 
To see this, we have 
N Nker(@) = ker(o) = {0} 
and ifa € M, then there is ab € N for which ob = oa and so 
T(a — b) = Fa — ob = 0 
Thus, 
a=b+ (a-b) EN +ker(S) 
which shows that ker(@) is a complement of N. 
Theorem 4.14 Let M and M; be R-modules and let N < M. 


D If M=N®@®H, then any R-epimorphism o:N — M, has a unique 
extension T: M — My, to an epimorphism with 


ker(@) = ker(o) $ H 
2) Leto: N ~ M, bean R-isomorphism. Then the correspondence 


Hc, where ker(a) = H 


is a bijection from complements of N onto the extensions of o. Thus, an 
isomorphism o: N ~ M, has an extension to M if and only if N is 
complemented. O 


Definition Let N < M. When the identity map t: N ~ N has an extension to 
o:M — N, the submodule N is called a retract of M and o is called the 
retraction map. O 


Corollary 4.15 A submodule N < M is a retract of M if and only if N has a 
complement in M.O 

Direct Summands and One-Sided Invertibility 

Direct summands are also related to one-sided invertibility of R-maps. 


Definition Let Tr: A — B be a module homomorphism. 
1) A left inverse of T is a module homomorphism Tgr: B —> A for which 


TLOT=L. 
2) A right inverse of T is a module homomorphism Tp: B — A for which 
TOTR=L. 


Left and right inverses are called one-sided inverses. An ordinary inverse is 
called a two-sided inverse. O 


Unlike a two-sided inverse, one-sided inverses need not be unique. 


Modules I: Basic Properties 123 


A left-invertible homomorphism o must be injective, since 
oa=ob>oa0,0c0a=o,cob>a=b 


Also, a right-invertible homomorphism o: A — B must be surjective, since if 
b € B, then 


b = a[or(b)] € im(c) 


For set functions, the converses of these statements hold: ø is left-invertible if 
and only if it is injective and ø is right-invertible if and only if it is surjective. 
However, this is not the case for R-maps. 


Leto: M — M; be an injective R-map. Referring to Figure 4.1, 


H 
im(c) 
| im(o) 
< im(o))-4 
M (ol M, 


Figure 4.1 


the map c|™0): M ~ im(c) obtained from o by restricting its range to im(c) is 
an isomorphism and the left inverses oz of o are precisely the extensions of 
(o|'™())-!:im(o) x M to Mı. Hence, Theorem 4.14 says that the 
correspondence 


H +> extension of (a|'™(”))~! with kernel H 


is a bijection from the complements H of im(c) onto the left inverses of o. 


Now let o: M — M; bea surjective R-map. Referring to Figure 4.2, 


ker(o) 
— > 
H oly 
OR=(O|,4)" M, 
Figure 4.2 


if ker(o) is complemented, that is, if 


M =ker(c) © H 


124 Advanced Linear Algebra 


then o|y:H ~ M, is an isomorphism. Thus, a map T: Mı — M is a right 
inverse of o if and only if 7 is a range-extension of (a|7)~': Mı ~ H, the only 
difference being in the ranges of the two functions. Hence, (o|7)~!: Mı > M 
is the only right inverse of ø with image H. It follows that the correspondence 


H => (o|) t: Mı > M 


is an injection from the complements H of ker(c) to the right inverses of ø. 
Moreover, this map is a bijection, since if og: Mı — M is a right inverse of g, 
then og: Mı ~ im(cR) and ø is an extension of op':im(or) ~ Mı, which 
implies that 


M =im(or) © ker(c) 


Theorem 4.16 Let M and M; be R-modules and let o: M — M; be an R-map. 
1) Leto: M —> M; be injective. The map 


H > extension of (o|™)~! with kernel H 


is a bijection from the complements H of im(c) onto the left inverses of o. 
Thus, there is exactly one left inverse of o for each complement of im(c) 
and that complement is the kernel of the left inverse. 

2) Leto: M — M; be surjective. The map 


H = (ola): Mı > M 
is a bijection from the complements H of ker(c) to the right inverses of o. 


Thus, there is exactly one right inverse of o for each complement H of 
ker(o) and that complement is the image of the right inverse. Thus, 


M =ker(c) $ H x ker(c) Him(o) O 


The last part of the previous theorem is worth further comment. Recall that if 
T: V — W isa linear transformation on vector spaces, then 


V x ker(r) Him(r) 


This holds for modules as well provided that ker(rT) is a direct summand. 


Modules Are Not as Nice as Vector Spaces 


Here is a list of some of the properties of modules (over commutative rings with 
identity) that emphasize the differences between modules and vector spaces. 


1) A submodule of a module need not have a complement. 

2) A submodule of a finitely generated module need not be finitely generated. 

3) There exist modules with no linearly independent elements and hence with 
no basis. 

4) A minimal spanning set or maximal linearly independent set is not 
necessarily a basis. 


5) 
6) 


Modules I: Basic Properties 125 


There exist free modules with submodules that are not free. 
There exist free modules with linearly independent sets that are not 
contained in a basis and spanning sets that do not contain a basis. 


Recall also that a module over a noncommutative ring may have bases of 
different sizes. However, all bases for a free module over a commutative ring 
with identity have the same size, as we will prove in the next chapter. 


Exercises 


1. 


2. 


10. 


11. 


Give the details to show that any commutative ring with identity is a 
module over itself. 

Let S = {v,,..., Un} be a subset of a module M. Prove that N = ((S)) is 
the smallest submodule of M containing S. First you will need to formulate 
precisely what it means to be the smallest submodule of M containing S. 
Let M be an R-module and let J be an ideal in R. Let IM be the set of all 
finite sums of the form 


riur +++ aUn 


where r; € I and v; € M. Is IM a submodule of M? 
Show that if S and T are submodules of M, then (with respect to set 
inclusion) 


SOT = glb{S,T} and S +T = lub{ S, T} 


Let S; C S2 C--- be an ascending sequence of submodules of an R- 
module M. Prove that the union (JS; is a submodule of M. 

Give an example of a module M that has a finite basis but with the property 
that not every spanning set in M contains a basis and not every linearly 
independent set in M is contained in a basis. 

Show that, just as in the case of vector spaces, an R-homomorphism can be 
defined by assigning arbitrary values on the elements of a basis and 
extending by linearity. 

Let r € homg( M,N) be an R-isomorphism. If B is a basis for M, prove 
that 76 = {rb | b € B} is a basis for N. 

Let M be an R-module and let 7 € homg( M, M) be an R-endomorphism. 
If r is idempotent, that is, if 7? = 7, show that 


M = ker(T) $ im(7) 


Does the converse hold? 

Consider the ring R = Fx, y] of polynomials in two variables. Show that 
the set M consisting of all polynomials in R that have zero constant term is 
an R-module. Show that M is not a free R-module. 

Prove that if R is an integral domain, then all R-modules M have the 
following property: If v1,...,Un is linearly independent over R, then so is 
rU,,.--;TU, for any nonzero r € R. 


126 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19, 
20. 


21. 
22. 


Advanced Linear Algebra 


Prove that if a nonzero commutative ring R with identity has the property 
that every finitely generated R-module is free then R is a field. 
Let M and N be R-modules. If S is a submodule of M and T is a 
submodule of N show that 

MƏN M N 

SeT S T 


If R is a commutative ring with identity and Z is an ideal of R, then Z is an 

R-module. What is the maximum size of a linearly independent set in Z? 

Under what conditions is Z free? 

a) Show that for any module M over an integral domain the set Mior of all 
torsion elements in a module M is a submodule of M. 

b) Find an example of a ring R with the property that for some R-module 
M the set Mior is not a submodule. 

c) Show that for any module M over an integral domain, the quotient 
module M / Mior is torsion-free. 

a) Find a module M that is finitely generated by torsion elements but for 
which ann( M) = {0}. 

b) Finda torsion module M for which ann( M) = {0}. 

Let N be an abelian group together with a scalar multiplication over a ring 

R that satisfies all of the properties of an R-module except that 1v does not 

necessarily equal v for all v € N. Show that N can be written as a direct 

sum of an R-module Np and another “pseudo R-module” Nj. 

Prove that homg( M,N) is an R-module under addition of functions and 

scalar multiplication defined by 


(rr)(v) = r(rv) = T(rv) 


Prove that any R-module M is isomorphic to the R-module homg(R, M). 
Let R and S be commutative rings with identity and let f: R — S be a ring 
homomorphism. Show that any S-module is also an R-module under the 
scalar multiplication 


rv = f(ryv 


Prove that homz (Zn, Zm) © Za where d = ged(n, m). 

Suppose that R is a commutative ring with identity. If Z and J are ideals of 
R for which R/Z ~ R/J as R-modules, then prove that T = J. Is the 
result true if R/T ~ R/J as rings? 


Chapter 5 
Modules II: Free and Noetherian Modules 


The Rank of a Free Module 


Since all bases for a vector space V have the same cardinality, the concept of 
vector space dimension is well-defined. A similar statement holds for free R- 
modules when the base ring is commutative (but not otherwise). 


Theorem 5.1 Let M be a free module over a commutative ring R with identity. 
1) Then any two bases of M have the same cardinality. 

2) The cardinality of a spanning set is greater than or equal to that of a basis. 
Proof. The plan is to find a vector space V with the property that, for any basis 
for M, there is a basis of the same cardinality for V. Then we can appeal to the 
corresponding result for vector spaces. 


Let Z be a maximal ideal of R, which exists by Theorem 0.23. Then R/T is a 
field. Our first thought might be that M is a vector space over R/T, but that is 
not the case. In fact, scalar multiplication using the field R/T, 


(r+ ZD)v=rv 


is not even well-defined, since this would require that ZTM = {0}. On the other 
hand, we can fix precisely this problem by factoring out the submodule 


IM = {av +- + antn | a; E T, vi E M} 


Indeed, M/TM is a vector space over R/T, with scalar multiplication defined 
by 


(r+T)\(u+TITM)=ru+TITM 
To see that this is well-defined, we must show that the conditions 
r+Z=r4+T 
u+IM =u +IM 


imply 


128 Advanced Linear Algebra 


rut ZIM =ru +IM 
But this follows from the fact that 
ru— ru =r(u— u) + (r-ru ETM 
Hence, scalar multiplication is well-defined. We leave it to the reader to show 
that M/TM is a vector space over R/T. 


Consider now a set B = {b; | i € I} C M and the corresponding set 


M 
= C — 
B+IM = {bh +IM |ie I} zir 


If B spans M over R, then B + TM spans M/TM over R/T. To see this, note 
that any v € M has the form v = “r;b; for r; € R and so 


v+IM = (Fr n.) +IM 


= X` r; (b, + ZM) 
j 


=X (ri, + Z) (bi, + IM) 
J 


which shows that B + TM spans M/Z M. 


Now suppose that B = {b; |i € I} is a basis for M over R. We show that 
B+TIM isa basis for M/ZM over R/T. We have seen that B+ ZTM spans 
M /TM. Also, if 


Xo (r; +2)(,, +7M) =IM 
j 
then >? ;Tijbi, € ZM and so 
» ribi = 5 ai bi, 
j k 


where a;, € Z. From the linear independence of 6 we deduce that ri, € Z for all 
j and so rj, + I = Z. Hence B+ ZM is linearly independent and therefore a 
basis, as desired. 


To see that |B] = |B + TM], note that if b; +TM = by, + TM, then 


b; = by, = 5 ai bi, 
j 


where ai, E T. If b; A by, then the coefficient of b; on the right must be equal to 


Modules II: Free and Noetherian Modules 129 


1 and so 1 € Z, which is not possible since Z is a maximal ideal. Hence, 
b; = by. 
Thus, if G is a basis for M over R, then 

|B| = |B +ZM| = dimg/z(M/TM) 


and so all bases for M over R have the same cardinality, which proves part 1). 


Finally, if 8 spans M over R, then B+ TM spans M/TM and so 
dimg;z(M/TM) < |B+ZM| < |b| 
Thus, B has cardinality at least as great as that of any basis for M over R.O 


The previous theorem allows us to define the rank of a free module. (The term 
dimension is not used for modules in general.) 


Definition Let R be a commutative ring with identity. The rank rk(M) of a 
nonzero free R-module M is the cardinality of any basis for M. The rank of the 
trivial module {0} is 0.0 


Theorem 5.1 fails if the underlying ring of scalars is not commutative. The next 
example describes a module over a noncommutative ring that has the 
remarkable property of possessing a basis of size n for any positive integer n. 


Example 5.1 Let V be a vector space over F with a countably infinite basis 
B = {bo, by,... }. Let L(V) be the ring of linear operators on V. Observe that 
L(V) is not commutative, since composition of functions is not commutative. 


The ring L(V) is an L(V )-module and as such, the identity map « forms a basis 
for L(V). However, we can also construct a basis for L(V ) of any desired finite 
size n. To understand the idea, consider the case n = 2 and define the operators 


(3, and 62 by 

Bi (box) = be, Fi (b2%+1) = 0 
and 

B2(b2k) = 0, B2(b2k+1) = bx 


These operators are linearly independent essentially because they are surjective 
and their supports are disjoint. In particular, if 


fB + go =0 
then 


0 = (f G1 + g2)(bor) = f (br) 


130 Advanced Linear Algebra 


and 
0 = (f 61 + gb2)(b2r+1) = g(br) 


which shows that f = 0 and g = 0. Moreover, if h € L(V), then we define f 
and g by 


from which it follows easily that 
h = fbi +gb 
which shows that {61, 62} is a basis for L(V). 


More generally, we begin by partitioning 6 into n blocks. For each 
s=0,...,n—1, let 


B, = {b; | i = smod n} 
Now we define elements 3, E€ L(V) by 
Bs (Din+t) = ôt, sbk 


where 0 < t < n and where ô; s is the Kronecker delta function. These functions 
are surjective and have disjoint support. It follows that C,, = { 80, .-.. , Bn-1} is 
linearly independent. For if 


0 = aobo +++ + Qn-1bn-1 
where a, € L(V), then, applying this to bkn gives 
0= 248: (Dkn+t) = ay (by) 


for all k. Hence, a; = 0. 


Also, Cn spans L(V), for ifr € L(V), we define a, E€ L(V) by 
ats (be) = T(Dients) 
to get 
(aobo + +++ + Qn—-18n-1) (Diente) = O49: (Dinte) = ar(bk) = T(Okntt) 
and so 
T = aobo + +++ + an1 bn-1 
Thus, Cn = {(0,..., Gn-1} is a basis for L(V) of size n.O 


Recall that if B is a basis for a vector space V over F’, then V is isomorphic to 
the vector space (F?)o of all functions from B to F that have finite support. A 


Modules II: Free and Noetherian Modules 131 


similar result holds for free R-modules. We begin with the fact that (R?)o is a 
free R-module. The simple proof is left to the reader. 


Theorem 5.2 Let B be any set and let R be a commutative ring with identity. 
The set (R®)o of all functions from B to R that have finite support is a free R- 
module of rank |B| with basis B = {6)} where 


aasa fazi 


This basis is referred to as the standard basis for (R”) 9.0 


Theorem 5.3 Let M be an R-module. If B is a basis for M, then M is 
isomorphic to (R®)o. 
Proof. Consider the map 7: M — (R? )o defined by setting 


Tb = by 


where ó, is defined in Theorem 5.2 and extending 7 to M by linearity. Since 7 
maps a basis for M to a basis B = {6,} for (R®)o, it follows that 7 is an 
isomorphism from M to (R?)9.0 


Theorem 5.4 Two free R-modules (over a commutative ring) are isomorphic if 
and only if they have the same rank. 

Proof. If M ~ N, then any isomorphism 7 from M to N maps a basis for M to 
a basis for N. Since 7 is a bijection, we have rk(M) = rk(V). Conversely, 
suppose that rk( M) = rk(N). Let B be a basis for M and let C be a basis for N. 
Since |B| = |C], there is a bijective map 7: B — C. This map can be extended by 
linearity to an isomorphism of M onto N and so M ~ N.O 


We have seen that the cardinality of a (minimal) spanning set for a free module 
M is at least equal to rk( M). Let us now speak about the cardinality of maximal 
linearly independent sets. 


Theorem 5.5 Let R be an integral domain and let M be a free R-module. Then 
all linearly independent sets have cardinality at most rk( M). 

Proof. Since M ~ (R")o we need only prove the result for (R“)o. Let Q be the 
field of quotients of R. Then (Q“)o is a vector space. Now, if 


B = {v; | i € I} C (R*)o € (Q")o 


is linearly independent over Q as a subset of (Q")o, then B is clearly linearly 
independent over R as a subset of (R")9. Conversely, suppose that B is linearly 
independent over R and 


i Tk 
— vi Hee ++ vi =0 
S1 Sk 


132 Advanced Linear Algebra 


where s; 4 0 for all i and r; #0 for some j. Multiplying by s = s;---s, #0 
produces a nontrivial linear dependency over R, 


8 s 
— riU Hoe + TRV, = 0 
$1 Sk 


which implies that r; = 0 for all 7. Thus B is linearly dependent over R if and 
only if it is linearly dependent over Q. But in the vector space (Q")o, all sets of 
cardinality greater than « are linearly dependent over Q and hence all subsets of 
(R"), of cardinality greater than « are linearly dependent over R.O 


Free Modules and Epimorphisms 


If o: M — F is a module epimorphism where F is free on B, then it is easy to 
define a right inverse for ø, since we can define an R-map og: F — M by 
specifying its values arbitrarily on B and extending by linearity. Thus, we take 
or(b) to be any member of o~'(b). Then Theorem 4.16 implies that ker(c) is a 
direct summand of M and 


M x ker(o) BF 


This discussion applies to the canonical projection 7: M — M/S provided that 
the quotient M / S is free. 


Theorem 5.6 Let R be a commutative ring with identity. 
D If o:M—F is an R-epimorphism and F is free, then ker(o) is 


complemented and 
M =ker(o) $ N x ker(o) HF 
where N x F. 
2) If Sis a submodule of M and if M/S is free, then S is complemented and 
M 
Mes 
S 


IfM, S and M/S are free, then 
tk(M) = rk(S) + rk( 
and if the ranks are all finite, then 
(%) = rk(M) — rk(S) 0 


Noetherian Modules 


One of the most desirable properties of a finitely generated R-module M is that 
all of its submodules be finitely generated: 


Modules II: Free and Noetherian Modules 133 


M finitely generated, S <M = S finitely generated 


Example 4.2 shows that this is not always the case and leads us to search for 
conditions on the ring R that will guarantee this property for R-modules. 


Definition 4n R-module M is said to satisfy the ascending chain condition 
(abbreviated ACC) on submodules if every ascending sequence of submodules 


Sı C S2 C S3C 
of M is eventually constant, that is, there exists an index k for which 


Sk = Spay = Sk = 


Modules with the ascending chain condition on submodules are also called 
Noetherian modules (after Emmy Noether, one of the pioneers of module 
theory). 0 


Since a ring R is a module over itself and since the submodules of the module R 
are precisely the ideals of the ring R, the preceding definition can be formulated 
for rings as follows. 


Definition A ring R is said to satisfy the ascending chain condition 
(abbreviated ACC) on ideals if any ascending sequence 


hy Se dg © iy © a 
of ideals of R is eventually constant, that is, there exists an index k for which 


Dy, = Thai = Tapa = 


A ring that satisfies the ascending chain condition on ideals is called a 
Noetherian ring. O 


The following theorem describes the relevance of this to the present discussion. 


Theorem 5.7 

1) An R-module M is Noetherian if and only if every submodule of M is 
finitely generated. 

2) In particular, a ring R is Noetherian if and only if every ideal of R is 
finitely generated. 

Proof. Suppose that all submodules of M are finitely generated and that M 

contains an infinite ascending sequence 


S CSCC (5.1) 


of submodules. Then the union 


134 Advanced Linear Algebra 


S=|)s; 


J 


is easily seen to be a submodule of M. Hence, S is finitely generated, say 
S = ((u,.-.,Un)). Since u; € S, there exists an index k; such that u; € Sj,. 
Therefore, if k = max{ky,...,k,}, we have 


{u1, 5 Un } C Sk 


and so 


S = (Ur Un) ¢ Sp E Spay C Sh C+ CS 


which shows that the chain (5.1) is eventually constant. 


For the converse, suppose that M satisfies the ACC on submodules and let S be 
a submodule of M. Pick u; € S and consider the submodule S, = ((u,)) C S 
generated by u1. If S1 = S, then S is finitely generated. If S; # S, then there is 
a uz E S — Sı. Now let S2 = ((u1,u2)). If S2 = S, then S is finitely generated. 
If S»AS, then pick u3€5—S and consider the submodule 
S3 = ((u,,U2,U3)). 


Continuing in this way, we get an ascending chain of submodules 
(u) S (ur, u2) C (ur, ua, us)) C+ CS 


If none of these submodules were equal to S, we would have an infinite 
ascending chain of submodules, each properly contained in the next, which 
contradicts the fact that M satisfies the ACC on submodules. Hence, 
S = (lui, ... , Un) for some n and so S is finitely generated. O 


Our goal is to find conditions under which all finitely generated R-modules are 
Noetherian. The very pleasing answer is that all finitely generated R-modules 
are Noetherian if and only if R is Noetherian as an R-module, or equivalently, 
as a ring. 


Theorem 5.8 Let R be a commutative ring with identity. 

1) R is Noetherian if and only if every finitely generated R-module is 
Noetherian. 

2) Let R bea principal ideal domain. If an R-module M is n-generated, then 
any submodule of M is also n-generated. 

Proof. For part 1), one direction is evident. Assume that R is Noetherian and 

let M = ((w,...,Un)) be a finitely generated R-module. Consider the 

epimorphism 7: R” — M defined by 


T(T1,--05Tn) = Tila Hees + Tp Un 


Let S be a submodule of M. Then 


Modules II: Free and Noetherian Modules 135 


7 (S$) = {u € R"| rue S} 


is a submodule of R” and r(7~!S') = S. If every submodule of R” is finitely 
generated, then 7~1(S') is finitely generated and so 77!(S) = ((v,...,U,)). 
Then S is finitely generated by {71,..., Tv, }. Thus, it is sufficient to prove the 
theorem for R”, which we do by induction on n. 


If n = 1, any submodule of R is an ideal of R, which is finitely generated by 
assumption. Assume that every submodule of R* is finitely generated for all 
1 < k< nand let S be a submodule of R”. 


If n > 1, we can extract from S something that is isomorphic to an ideal of R 
and so will be finitely generated. In particular, let S41 be the “last coordinates” in 
S, specifically, let 


Sı = {(0,...,0,@n) | (a1,.-.,;@n-1,@n) € S for some a1, ...,an-1 E R} 


The set S4 is isomorphic to an ideal of R and is therefore finitely generated, say 
= ((G1)), where G1 = {gi,..., gk} is a finite subset of S4. 


Also, let 
S2 = {v E S|v=(a,...,@n-1,0) for some a1, ...,an-1 E€ R} 


be the set of all elements of S that have last coordinate equal to 0. Note that S2 
is a submodule of R” and is isomorphic to a submodule of R"~!. Hence, the 
inductive hypothesis implies that S% is finitely generated, say S2 = ((G2)), where 
Gp is a finite subset of S. 


By definition of S4, each g; € G, has the form 
gi = (0,...,0, gin) 
for gin E€ R where there is a g; E€ S of the form 
Gi = (Gil, +++ Gin—-1 Gin) 


Let G1 = {G,,...,9,}. We claim that S is generated by the finite set G1 U G2. 


To see this, let v = (a1,...,@,) E€ S. Then (0,...,0,a,) E Sı and so 


(0,...,0, dn) 23 rigi 


for r; € R. Consider now the sum 


136 Advanced Linear Algebra 


The last coordinate of this sum is 


k 


X Tigin = ün 


i=1 


and so the difference v — w has last coordinate 0 and is thus in S2 = {G2}. 
Hence 


v= (v = w) +w € (Gr) + (G2) = (GU Ge) 


as desired. 


For part 2), we leave it to the reader to review the proof and make the necessary 
changes. The key fact is that Sı is isomorphic to an ideal of R, which is 
principal. Hence, Sı is generated by a single element of M.O 


The Hilbert Basis Theorem 


Theorem 5.8 naturally leads us to ask which familiar rings are Noetherian. The 
following famous theorem describes one very important case. 


Theorem 5.9 (Hilbert basis theorem) /f a ring R is Noetherian, then so is the 
polynomial ring R[x]. 

Proof. We wish to show that any ideal Z in R[x] is finitely generated. Let L 
denote the set of all leading coefficients of polynomials in Z, together with the 0 
element of R. Then L is an ideal of R. 


To see this, observe that if a € L is the leading coefficient of f(x) € Z and if 
r € R, then either ra = 0 or else ra is the leading coefficient of rf (x) € Z. In 
either case, ra € L. Similarly, suppose that 8 € L is the leading coefficient of 
g(x) € T. We may assume that deg f (x) = i and deg g(x) = j, with i < j. Then 
h(x) = x? f(x) is in Z, has leading coefficient a and has the same degree as 
g(x). Hence, either a— 8 is 0 or œ— ĝ is the leading coefficient of 
h(x) — g(x) € T. In either case a — 8 € L. 


Since L is an ideal of the Noetherian ring R, it must be finitely generated, say 
L = (a1, ..., am). Since a; € L, there exist polynomials f;(x) € Z with leading 
coefficient a;. By multiplying each f;(x) by a suitable power of x, we may 
assume that 


deg f;(x) = d = max{deg f;(x)} 


for alli =1,...,m. 


Modules II: Free and Noetherian Modules 137 


Now for k=0,...,d—1 let Lẹ be the set of all leading coefficients of 
polynomials in Z of degree k, together with the 0 element of R. A similar 
argument shows that Ly is an ideal of R and so Ly is also finitely generated. 
Hence, we can find polynomials P, = {pp1(x),---, Pkn, (£)} in Z whose 
leading coefficients constitute a generating set for Ly. 


Consider now the finite set 


d-1 
k=0 


If J is the ideal generated by P, then J C Z. An induction argument can be 
used to show that J =Z. If g(x) © Z has degree 0, then it is a linear 
combination of the elements of P) (which are constants) and is thus in J. 
Assume that any polynomial in Z of degree less than k is in 7 and let g(x) € T 
have degree k. 


If k < d, then some linear combination h(x) over R of the polynomials in P, 
has the same leading coefficient as g(x) and if k > d, then some linear 
combination h(x) of the polynomials 


fae FA ews a“ Fe) Cy 


has the same leading coefficient as g(a). In either case, there is a polynomial 
h(a) € J that has the same leading coefficient as g(x). Since g(x) — h(a) € T 
has degree strictly smaller than that of g(x) the induction hypothesis implies that 


g(a) — h(a) € J 
and so 
g(a) = [g(x) — h(x)] + h(x) € I 
This completes the induction and shows that Z = J is finitely generated.0 
Exercises 


1. IfM isa free R-module and 7: M — N is an epimorphism, then must N 
also be free? 

2. Let Z be an ideal of R. Prove that if R/T is a free R-module, then Z is the 
zero ideal. 

3. Prove that the union of an ascending chain of submodules is a submodule. 

4. Let S be a submodule of an R-module M. Show that if M is finitely 
generated, so is the quotient module M/S. 

5. Let S be a submodule of an R-module. Show that if both S and M/S are 
finitely generated, then so is M. 

6. Show that an R-module M satisfies the ACC for submodules if and only if 
the following condition holds. Every nonempty collection S of submodules 


138 


12. 


13. 


14. 


15. 


Advanced Linear Algebra 


of M has a maximal element. That is, for every nonempty collection S of 

submodules of M there is an SeS with the property that 

TESsSTcs. 

Let T: M — N be an R-homomorphism. 

a) Show that if M is finitely generated, then so is im(rT). 

b) Show that if ker(r) and im(r) are finitely generated, then 
M =ker(r) +S where S is a finitely generated submodule of M. 
Hence, is finitely generated. 

If R is Noetherian and Z is an ideal of R show that R/T is also Noetherian. 

Prove that if R is Noetherian, then so is R[x,..., £n]. 


. Find an example of a commutative ring with identity that does not satisfy 


the ascending chain condition. 


. a) Prove that an R-module M is cyclic if and only if it is isomorphic to 


R/T where T is an ideal of R. 

b) Prove that an R-module M is simple (M 4 {0} and M has no proper 
nonzero submodules) if and only if it is isomorphic to R/Z where T is 
a maximal ideal of R. 

c) Prove that for any nonzero commutative ring R with identity, a simple 
R-module exists. 

Prove that the condition that R be a principal ideal domain in part 2) of 

Theorem 5.8 is required. 

Prove Theorem 5.8 in the following way. 

a) Show that if T C S are submodules of M and if T and S/T are 
finitely generated, then so is S. 

b) The proof is again by induction. Assuming it is true for any module 
generated by n elements, let M = (v,...,Un41)) and let 
M' = ((v1,...,Un)). Then let T = SM M' in part a). 

Prove that any R-module M is isomorphic to the quotient of a free module 

F. If M is finitely generated, then F can also be taken to be finitely 

generated. 

Prove that if S and T are isomorphic submodules of a module M it does 

not necessarily follow that the quotient modules M/S and M/T are 

isomorphic. Prove also that if S 67, ~ S ®T» as modules it does not 

necessarily follow that T) ~ T>. Prove that these statements do hold if all 

modules are free and have finite rank. 


Chapter 6 
Modules over a Principal Ideal Domain 


We remind the reader of a few of the basic properties of principal ideal 
domains. 


Theorem 6.1 Let R be a principal ideal domain. 

1) Anelementr € R is irreducible if and only if the ideal (r) is maximal. 

2) Anelement in R is prime if and only if it is irreducible. 

3) Ris a unique factorization domain. 

4) R satisfies the ascending chain condition on ideals. Hence, so does any 
finitely generated R-module M. Moreover, if M is n-generated, then any 
submodule of M is n-generated. 


Annihilators and Orders 


When R is a principal ideal domain, all annihilators are generated by a single 
element. This permits the following definition. 


Definition Let R be a principal ideal domain and let M be an R-module. 

1) If N is a submodule of M, then any generator of ann(N) is called an order 
of N. 

2) An order of an element v € M is an order of the submodule ((v)).O 


For readers acquainted with group theory, we mention that the order of a 
module corresponds to the smallest exponent of a group, not to the order of the 


group. 


Theorem 6.2 Let R be a principal ideal domain and let M be an R-module. 

1) If a is an order of N < M, then the orders of N are precisely the 
associates of a. We denote any order of N by o(N) and, as is customary, 
refer to o( N) as “the” order of N. 

2) JM = ARB, then 


o(M) = Icm(o( A), o(B)) 


140 Advanced Linear Algebra 


that is, the orders of M are precisely the least common multiples of the 
orders of A and B. 
Proof. We leave proof of part 1) for the reader. For part 2), suppose that 


o(M)=6, o(A)=a, o(B)=8, à= lcm(a,ß) 


Then 6A = {0} and 6B = {0} imply that a | 6 and 8 | 6 and so A | 6. On the 
other hand, À annihilates both A and B and therefore also M = A @ B. Hence, 
6 | À and so À ~ 6 is an order of M.O 


Cyclic Modules 


The simplest type of nonzero module is clearly a cyclic module. Despite their 
simplicity, cyclic modules will play a very important role in our study of linear 
operators on a finite-dimensional vector space and so we want to explore some 
of their basic properties, including their composition and decomposition. 


Theorem 6.3 Let R be a principal ideal domain. 

D If Ww) is a cyclic R-module with annihilator (a), then the multiplication 
map T: R — ((v)) defined by rr = rv is an R-epimorphism with kernel (a). 
Hence the induced map 


a: uy) 


a) 
defined by 
T(r + (a)) = rv 
is an isomorphism. In other words, cyclic R-modules are isomorphic to 
quotient modules of the base ring R. 


2) Any submodule of a cyclic R-module is cyclic. 
3) If (v)) is a cyclic submodule of M of order a, then for 3 € R, 


Q 


o0((Bv))) = gcd(8, a) 


Also, 
(Bu) = w) e  (o(v),8)=1 =  0(Bv) = o(v) 


Proof. We leave proof of part 1) as an exercise. For part 2), let S < ((v)). Then 
I = {r € R | rv € S} isan ideal of R and so J = (s) for some s € R. Thus, 


S = Iv = Rsv = ((sv)) 


For part 3), we have r(Gv) = 0 if and only if (r8)v = 0, that is, if and only if 
a | r8, which is equivalent to 


Modules Over a Principal Ideal Domain 141 


— Q 
7 geda, 3) | 


Thus, r € ann( 8v) if and only if r € (y) and so ann(8v) = (7). For the second 
statement, if (a, 3) = 1 then there exist a,b € R for which aa + b8 = 1 and so 


v = (aa + bp)u = bv € Kv) E K) 


and so ((Gv)) = (v)). Of course, if {8v} = ((v)) then o(Gv) = a. Finally, if 
o( 8v) = a, then 
a 
Q = opu) = — 
u gcd(a, (3) 

and so (a, 3) = 1.0 
The Decomposition of Cyclic Modules 
The following theorem shows how cyclic modules can be composed and 
decomposed. 


Theorem 6.4 Let M be an R-module. 
1) (Composing cyclic modules) /f ui,..., Uun E M have relatively prime 
orders, then 


oļu + +++ + Un) = o(u1) -o(un) 
and 
(ur) @ +++ Kun) = (ur + +++ + Un) 
Consequently, if 
M =A + An 


where the submodules A; have relatively prime orders, then the sum is 
direct. 

2) (Decomposing cyclic modules) /f o(v) = a4---a, where the ai's are 
pairwise relatively prime, then v has the form 


V = Uu He + Un 
where o(ui) = a; and so 
(w) = (ur + +++ + un) = (ur) B D (un) 


Proof. For part 1), let a, = o(ux), H := Q1: Qn and v := uy +++ + un. Then 
since p annihilates v, the order of v divides p. If o(v) is a proper divisor of p, 
then for some index k, there is a prime p | a, for which u/p annihilates v. But 
u/p annihilates each u; for i # k. Thus, 


142 Advanced Linear Algebra 


on Hy = # = (4) 
v Uk Uk 
p Pp P \ Qk 


Since o(u;) and u/&p are relatively prime, the order of (j1/a,)uz is equal to 
o(ux) = az, which contradicts the equation above. Hence, o(v) = wu. 


It is clear that (uy; +--+ un) C (u1)) ©- B Kun). For the reverse 
inclusion, since a, and ju/a are relatively prime, there exist r,s € R for which 


ray pet =1 
ay 


Hence 
uy = (ras + s\n = saun = s (un +e + un) E (uy + Un) 
ay, Qı (aan 
Similarly, up € (uy + +--+ un) for all k and so we get the reverse inclusion. 


Finally, to see that the sum above is direct, note that if 
Ute + Un = 0 
where v; € A;, then each v; must be 0, for otherwise the order of the sum on the 
left would be different from 1. 
For part 2), the scalars 3), = j1/a,, are relatively prime and so there exist a; € R 
for which 
abı + +++ + ann = 1 
Hence, 
v = (abi + +++ + Gn Bn)v = a1 BU + +++ + an Bnv 


Since 0(G;v) = u/gcd(u, Bk) = ap and since a, and ay, are relatively prime, 
we have o(a;,(3;,v) = a. The second statement follows from part 1).0 


Free Modules over a Principal Ideal Domain 


We have seen that a submodule of a free module need not be free: The 
submodule Z x {0} of the module Z x Z over itself is not free. However, if R 
is a principal ideal domain this cannot happen. 


Theorem 6.5 Let M be a free module over a principal ideal domain R. Then 
any submodule S of M is also free and rk( S) < rk( M). 

Proof. We will give the proof first for modules of finite rank and then 
generalize to modules of arbitrary rank. Since M ~ R” where n = rk(M) is 
finite, we may in fact assume that M = R”. For each 1 < k < n, let 


Modules Over a Principal Ideal Domain 143 


Iy = {r € R | (a,...,a¢-1,7,0,...,0) € S for some ay,...,a,-1 € R} 


Then it is easy to see that J, is an ideal of R and so J, = (rp) for some rg € R. 
Let 


Up = (Qi, -3 0k-1; Tk 0,0) ES 
We claim that 

B = {up | k = 1,...,n and rg 4 0} 
is a basis for S. As to linear independence, suppose that 

B = {Uige ui} 
and that 
aj Uj H + a;,u;, = 0 

Then comparing the j,th coordinates gives a;r; =0 and since rj, Æ 0, it 


follows that aj, = 0. In a similar way, all coefficients are 0 and so B is linearly 
independent. 


To see that G spans S, we partition the elements x € S according to the largest 
coordinate index i(x) with nonzero entry and induct on i(x). If i(x) = 0, then 
x = 0, which is in the span of B. Suppose that all x € S with i(x) < k are in 
the span of B and let i(x) = k, that is, 


T= (a1;.--;~,0,222;0) 


where a; #0. Then a, E€ i, and so r £0 and a, = crk for some c€ R. 
Hence, i(x — cuz) < k and so y= x — cup E€ ((B)) and therefore x € (B). 
Thus, B is a basis for S. 


The previous proof can be generalized in a more or less direct way to modules 
of arbitrary rank. In this case, we may assume that M = (R")o is the R-module 
of functions with finite support from « to R, where « is a cardinal number. We 
use the fact that « is a well-ordered set, that is, & is a totally ordered set in which 
any nonempty subset has a smallest element. If a € «, the closed interval (0, a] 
is 


(0,a] = {en |0<a<a} 
Let S < M. For each 0 < a < «r, let 
Ma = {f € S | supp(f) € [0,a]} 
Then the set 
In = {f(@) | f € Ma} 


144 Advanced Linear Algebra 


is an ideal of R and so I, = (fa(a)) for some fa E S. We show that 
B={falO0<a<k, fala) £0} 
is a basis for S. First, suppose that 
rı for + ee + Tn fo, =0 
where a; < a; for i < j. Applying this to a, gives 
Tufa, (an) = 0 


and since R is an integral domain, rn = 0. Similarly, r; = 0 for all 7 and so B is 
linearly independent. 


To show that 6 spans S, since any f € S has finite support, there is a largest 
index af = i(f) for which f (ap) 4 0. Now, if (B)) < S, then since « is well- 
ordered, we may choose a g € S \ (B)) for which a = a, = i(g) is as small as 
possible. Then g € Ma. Moreover, since 04 g(a) € Ig, it follows that 
fala) # 0 and g(a) = c fala) for some c € R. Then 


supp(g — cfa) E [0, a] 
and 
(9 — cfa) (a) = g(a) — c fala) = 0 
and so i(g — cfa) < a, which implies that g — c fa € (Bẹ. But then 
g = (g9 — cfa) + cfa E (BY 


a contradiction. Thus, B is a basis for S.O 


In a vector space of dimension n, any set of n linearly independent vectors is a 
basis. This fails for modules. For example, Z is a Z-module of rank 1 but the 
independent set {2} is not a basis. On the other hand, the fact that a spanning set 
of size n is a basis does hold for modules over a principal ideal domain, as we 
now show. 


Theorem 6.6 Let M be a free R-module of finite rank n, where R is a principal 
ideal domain. Let S = {s1,... , Sn} be a spanning set for M. Then S is a basis 
for M. 

Proof. Let B = {b1,... , bn} be a basis for M and define the map 7: M — M by 
Tb; = s; and extending to a surjective R-homomorphism. Since M is free, 
Theorem 5.6 implies that 


M ~ ker(r) Him(r) = ker(T) HM 


Since ker(r) is a submodule of the free module and since R is a principal ideal 
domain, we know that ker(7) is free of rank at most n. It follows that 


Modules Over a Principal Ideal Domain 145 


rk(M) = rk(ker(7)) + rk(M) 


and so rk(ker(7)) = 0, that is, ker(7) = {0}, which implies that r is an R- 
isomorphism and so S is a basis. O 


In general, a basis for a submodule of a free module over a principal ideal 
domain cannot be extended to a basis for the entire module. For example, the set 
{2} is a basis for the submodule 2Z of the Z-module Z, but this set cannot be 
extended to a basis for Z itself. We state without proof the following result 
along these lines. 


Theorem 6.7 Let M be a free R-module of rank n, where R is a principal ideal 
domain. Let N be a submodule of M that is free of rank k < n. Then there is a 
basis B for M that contains a subset S = {vy,...,vx~} for which 
{rivi,...,7¢UK} is a basis for N, for some nonzero elements rı, ..., rg of R.O 


Torsion-Free and Free Modules 


Let us explore the relationship between the concepts of torsion-free and free. It 
is not hard to see that any free module over an integral domain is torsion-free. 
The converse does not hold, unless we strengthen the hypotheses by requiring 
that the module be finitely generated. 


Theorem 6.8 A finitely generated module over a principal ideal domain is free 
if and only if it is torsion-free. 

Proof. We leave proof that a free module over an integral domain is torsion-free 
to the reader. Let G = {v1, ... , Un } be a generating set for M. Consider first the 
case n = 1, whence G = {v}. Then G is a basis for M since singleton sets are 
linearly independent in a torsion-free module. Hence, M is free. 


Now suppose that G = {u,v} is a generating set with u,v Æ 0. If G is linearly 
independent, we are done. If not, then there exist nonzero r,s € R for which 
ru = sv. It follows that sM = s((u,v)) C ((u)) and so sM is a submodule of a 
free module and is therefore free by Theorem 6.5. But the map 7: M — sM 
defined by Tv = sv is an isomorphism because M is torsion-free. Thus M is 


also free. 


Now we can do the general case. Write 
G = {u1,..., Uk, Uly., Unk} 


where S = {u1,..., ug} is a maximal linearly independent subset of G. (Note 
that S is nonempty because singleton sets are linearly independent.) 


For each v;, the set {u1,..., Up, vi} is linearly dependent and so there exist 
a; E€ Rand r,,...,7, E R for which 


146 Advanced Linear Algebra 


QiVi + TU +++ + rpuR =O 
If a = ai: -an-p, then 
aM = a((ui,..., Uk, Ul; ---;Un-k) C (ur,.-., UK) 
and since the latter is a free module, so is aM, and therefore so is M.O 
The Primary Cyclic Decomposition Theorem 


The first step in the decomposition of a finitely generated module M over a 
principal ideal domain R is an easy one. 


Theorem 6.9 Any finitely generated module M over a principal ideal domain R 
is the direct sum of a finitely generated free R-module and a finitely generated 
torsion R-module 


M= Mere ® Mior 


The torsion part Mio; is unique, since it must be the set of all torsion elements of 
M, whereas the free part Mee is unique only up to isomorphism, that is, the 
rank of the free part is unique. 

Proof. It is easy to see that the set Miror of all torsion elements is a submodule of 
M and the quotient M /Mior is torsion-free. Moreover, since M is finitely 
generated, so is M /Mior. Hence, Theorem 6.8 implies that M /Mior is free. 
Hence, Theorem 5.6 implies that 


M = Mor & F 
where F ~ M /Mio is free. 
As to the uniqueness of the torsion part, suppose that M = T ẹ G where T is 


torsion and G is free. Then T C Mor. But if v = t + g € Mior for t € T and 
g E€ G, then g = v — t € Mr and so g = 0 and v € T. Thus, T = Mor. 


For the free part, since M = Mior B F = Mior ® G, the submodules F and G 
are both complements of Mior and hence are isomorphic. O 
Note that if {w1, ... , Wm} is a basis for Mfree we can write 

M = (w1) D: B (Wm) D Mor 


where each cyclic submodule ((w;)) has zero annihilator. This is a partial 
decomposition of M into a direct sum of cyclic submodules. 


The Primary Decomposition 


In view of Theorem 6.9, we turn our attention to the decomposition of finitely 
generated torsion modules M over a principal ideal domain. The first step is to 
decompose M into a direct sum of primary submodules, defined as follows. 


Modules Over a Principal Ideal Domain 147 


Definition Let p be a prime in R. A p-primary (or just primary) module is a 
module whose order is a power of p. O 


Theorem 6.10 (The primary decomposition theorem) Let M be a torsion 
module over a principal ideal domain R, with order 


En 


e1 
H = pı "Pn 


where the p;'s are distinct nonassociate primes in R. 
1) M isthe direct sum 


M = Mp ® D Mp, 
where 


M, = EM = {v€ M | pv =0} 
Pi 
is a primary submodule of order p;'. This decomposition of M into primary 
submodules is called the primary decomposition of M. 


2) The primary decomposition of M is unique up to order of the summands. 
That is, if 


M=N,0--ON, 


m 


where N, is primary of order q; i and qi, ---, qm are distinct nonassociate 
primes, then m =n and, after a possible reindexing, Ng, = Mp,. Hence, 
fi =e; and qi ~ pi fori =1,...,n. 
3) Two R-modules M and N are isomorphic if and only if the summands in 
their primary decompositions are pairwise isomorphic, that is, if 
M = Mp ®::-® Mp, 
and 


N=N 9 ON 


m 


are primary decompositions, then m = n and, after a possible reindexing, 
My, ~ Ny fori =1,...,n. 
Proof. Let us write 4; = ju/p;' and show first that 


Mp; = iM = {pv |v E M} 


Since p; (u;M) = pM = {0}, we have u;M C M,,. On the other hand, since 
Li and p*' are relatively prime, there exist a,b € R for which 


apy; + bp; = 1 


and so if  € Mp, then 


148 Advanced Linear Algebra 


x = (api + bp;')x = apix € iM 
Hence Mp, = iM. 


For part 1), since gcd(ji1,..., 4n) = 1, there exist scalars a; for which 
ay pty +++ + antn = 1 


and so for any x € M, 


n 
t= (aim ap aia An fbn) & € 5 uiM 
i=1 


ein 


Moreover, since the o(p;M) | p;' and the p;'s are pairwise relatively prime, it 


follows that the sum of the submodules jz; is direct, that is, 
M = mM $- nM = Mp, $- D Mp, 
As to the annihilators, it is clear that (p;') C ann(p;M). For the reverse 


inclusion, if r € ann(p;M), then ru; € ann(M) and so pi'i | rui, that is, 
p; |r andso r € (p;'). Thus ann(u;M) = (p;'). 


As to uniqueness, we claim that q = qf- --qf” is an order of M. It is clear that q 
annihilates M and so u |q. On the other hand, N}, contains an element u; of 
order qf and so the sum v= ui +--+ um has order q, which implies that 
q | u. Hence, q and p are associates. 


Unique factorization in R now implies that m =n and, after a suitable 
reindexing, that f; = e; and q; and p; are associates. Hence, Nj, is primary of 
order p;'. For convenience, we can write N,, as N,,. Hence, 


Np, C {v E€ M | piv = 0} = My, 
But if 
Np, ® ++: B No, = Mp 8-::B Mp, 


and Np, C Mp, for all 7, we must have N,, = Mp, for all 7. 


For part 3), if m = n and o;: Mp, ~ Nq, then the map o: M — N defined by 
o(a apna an) = oilai) pe On(an) 


is an isomorphism and so M ~ N. Conversely, suppose that o: M ~ N. Then 
M and N have the same annihilators and therefore the same order 


En 


= gal 
H = Pi Pn 


Hence, part 1) and part 2) imply that m = n and after a suitable reindexing, 


Modules Over a Principal Ideal Domain 149 


qi = pi. Moreover, since 


a E€ Mp © hia = 0 & o(ma) =0 & uisa = 0 S oa E Ny 


it follows that o: Mp, ~ N,,.O 
The Cyclic Decomposition of a Primary Module 


The next step in the decomposition process is to show that a primary module 
can be decomposed into a direct sum of cyclic submodules. While this 
decomposition is not unique (see the exercises), the set of annihilators is unique, 
as we will see. To establish this uniqueness, we use the following result. 


Lemma 6.11 Let M be a module over a principal ideal domain R and let 

p E R bea prime. 

1) If pM = {0}, then M is a vector space over the field R/(p) with scalar 
multiplication defined by 


(r+ (p))v = rv 


forallue M. 
2) For any submodule S of M the set 


SP = {v€ S | pv = 0} 
is also a submodule of M and if M = S T, then 
M®) = SH a T”) 


Proof. For part 1), since p is prime, the ideal (p) is maximal and so R/(p) is a 
field. We leave the proof that M is a vector space over /(p) to the reader. For 
part 2), it is straightforward to show that S$) is a submodule of M. Since 
S®) C S and T” CT we see that SP) QT = {0}. Also, if v € M), then 
pv = 0. But v = s +t for some s € S and t € T and so 0 = pv = ps + pt. 
Since ps € S and pt € T we deduce that ps = pt = 0, whence v € S”) @ T"), 
Thus, M) C S$) @ TP), But the reverse inequality is manifest. O 


Theorem 6.12 (The cyclic decomposition theorem of a primary module) Let 
M be a primary finitely generated torsion module over a principal ideal domain 
R, with order p°. 

1) M isa direct sum 


M = (v) D erig (un) (6.1) 


of cyclic submodules with annihilators ann(((v;))) = (p“), which can be 
arranged in ascending order 


ann(((v1}) C = C ann((u,))) 


150 Advanced Linear Algebra 


or equivalently, 


2) As to uniqueness, suppose that M is also the direct sum 
M = (u1) B ® (um) 


of cyclic submodules with annihilators ann(((u;))) = (q/), arranged in 
ascending order 


ann(((u1))) C + C ann(((um))) 
or equivalently 
Ffo fm 
Then the two chains of annihilators are identical, that is, m = n and 
ann(((u;)) = ann(((v))) 


for alli. Thus, p ~ qand fi = e; for all i. 
3) Two p-primary R-modules 


M = (v) D D (un) 
and 
N = (u1) D D Kum) 


are isomorphic if and only if they have the same annihilator chains, that is, 
if and only ifm = n and, after a possible reindexing, 


ann(Qu:))) = ann(((v))) 
Proof. Let vı € M have order equal to the order of M, that is, 
ann(v;) = ann( M) = (p°) 
Such an element must exist since o(vı) < p° for all v € M and if this inequality 


is strict, then p*—! will annihilate M. 


If we show that {vı} is complemented, that is, M = ((v1)) @ S; for some 
submodule 5}, then since 5} is also a finitely generated primary torsion module 
over R, we can repeat the process to get 


M = (v1) © (v2) © 92 
where ann(v;) = (p). We can continue this decomposition: 
M = (v1) ® (v2) ® ++ B (Yn) © Sn 


as long as S,, # {0}. But the ascending sequence of submodules 


Modules Over a Principal Ideal Domain 151 


(v1)) E Kv) @ (v2) G+ 
must terminate since M is Noetherian and so there is an integer n for which 
eventually S,, = {0}, giving (6.1). 
Let v = vı. The direct sum M = ((v)) © {0} clearly exists. Suppose that the 
direct sum 
My, = ((v)) @ Sk 


exists. We claim that if M < M, then it is possible to find a submodule Sg41 
for which Sp < Sk+ı and for which the direct sum My; = (vY ® S41 also 
exists. This process must also stop after a finite number of steps, giving 
M = ((v)) © S as desired. 


If M; < M and u € M \ M; let 
Sry = (Sk, u — av)) 


for a € R. Then Sk < S41 since u ¢ Mp. We wish to show that for some 
a € R, the direct sum 


(v)) © Skya 
exists, that is, 
x E WYN (Sk, u — av)) > 2 =0 
Now, there exist scalars a and b for which 
x = av = s + b(u — av) 
for s € S; and so if we find a scalar a for which 
b(u — av) € Sk (6.2) 


then (v) N Sk = {0} implies that x = 0 and the proof of existence will be 
complete. 


Solving for bu gives 
bu = (a + ab\v — s € (v)) 8S, = Mk 
so let us consider the ideal of all such scalars: 
T={reR|ruce My} 
Since p° € Z and Z is principal, we have 
T= (p') 
for some f < e. Also, f > 0 since u Mp implies that 1 ¢ T. 


152 Advanced Linear Algebra 


Since b € Z, we have b = Gp! and there exist d € R and t € S; for which 
plu=dv+t 
Hence, 
bu = Bp'u = (dv + t) = bdu + Bt 


Now we need more information about d. Multiplying the expression for p'u by 
p°-! gives 


0 = p°u = pl (plu) = pdu + pt 


and since ((v)) N Sp = {0}, it follows that p°/dv = 0. Hence, p° | pfd, that 
is, p’ | d and so d = 6p! for some 6 € R. Now we can write 


bu = Béplu + Bt 


and so 


b(u — dv) = bt € Sk 
Thus, we take a = 6 to get (6.2) and that completes the proof of existence. 
For uniqueness, note first that M has orders p“ and q/ and so p and q are 


associates and e; = fı. Next we show that n = m. According to part 2) of 
Lemma 6.10, 


MP = (YP B- B (un) ”) 
and 
MP = (x) E- © (um) 


where all summands are nonzero. Since pM) = {0}, it follows from Lemma 
6.10 that MP is a vector space over R/(p) and so each of the preceding 
decompositions expresses M ) as a direct sum of one-dimensional vector 
subspaces. Hence, m = dim(M)) = n. 


Finally, we show that the exponents e; and f; are equal using induction on e1. If 
e, = 1, then e; = 1 for all ¿ and since fı = e,, we also have f; = 1 for all i. 
Suppose the result is true whenever e; < k — 1 and let e; = k. Write 


(€1,---;€n) = (€1,---,€@s,1,---,1),e, > 1 


and 


(fis +5 Fn) =h iresi te lye 5 1) fe > 1 


Modules Over a Principal Ideal Domain 153 


Then 


pM = piv} B+: ® pv) 


and 


pM = piui) © B phu) 


But p((v1)) = {pv} is a cyclic submodule of M with annihilator (p*—') and so 
by the induction hypothesis 


s=tande, = fi,...,e,= fs 


which concludes the proof of uniqueness. 


For part 3), suppose that o: MW = N and M has annihilator chain 
ann(((v1))) G +++ C ann((vn))) 
and N has annihilator chain 
ann(((u1))) C +++ C ann(((um))) 
Then 
N =oM = (ov) -D (ovn)) 
and so m = n and after a suitable reindexing, 
ann(((v;))) = ann((ov,))) = ann( Kui) ) 
Conversely, suppose that 
M = (v1) ® +++ ® (en) 
and 
N = (u) @ +++ O (Um) 
have the same annihilator chains, that is, m = n and 
ann(((2;))) = ann(((v))) 
Then 


The Primary Cyclic Decomposition 


Now we can combine the various decompositions. 


Theorem 6.13 (The primary cyclic decomposition theorem) Let M be a 
finitely generated torsion module over a principal ideal domain R. 


154 


!) 


2) 


3) 


Advanced Linear Algebra 


If M has order 

U= Dip! 
where the p;'s are distinct nonassociate primes in R, then M can be 
uniquely decomposed (up to the order of the summands) into the direct sum 


M = Mp, ®-:-® Mp, 
where 


My, = E&M = {ve M | pv =0} 
P; 


is a primary submodule with annihilator (p;'). Finally, each primary 
submodule M,, can be written as a direct sum of cyclic submodules, so that 


M = [Kv11) E B Kora) D B [Kun © B nk) 
M, 


Pn 


Mp 


ij 


where ann(((v;,;))) = (p;”) and the terms in each cyclic decomposition can 
be arranged so that, for each i, 


ann(((vj,1))) C +++ C ann((w%,,))) 


or, equivalently, 


ei = 11 Z C12 2 °° Z i,k; 


As for uniqueness, suppose that 


M = | Kua) © +++ B Kura] © ++ ® [Kuma +++ © Kumin) 
N, N, 


a am 


is also a primary cyclic decomposition of M. Then, 

a) The number of summands is the same in both decompositions; in fact, 
m = n and after possible reindexing, ky = jy for all u. 

b) The primary submodules are the same; that is, after possible 
reindexing, q; ~ pi and Ny, = Mp, 

c) For each primary submodule pair Ng, = Mp, the cyclic submodules 
have the same annihilator chains; that is, after possible reindexing, 


ann(((u;,;))) = ann(((v;,;))) 


for allt, j. 
In summary, the primary submodules and annihilator chains are uniquely 
determined by the module M. 
Two R-modules M and N are isomorphic if and only if they have the same 
annihilator chains. O 


Modules Over a Principal Ideal Domain 155 


Elementary Divisors 


Since the chain of annihilators 
ann(((v;,;))) = (p;"") 


is unique except for order, the multiset {p;"’} of generators is uniquely 
determined up to associate. The generators p,” are called the elementary 
divisors of M. Note that for each prime p;, the elementary divisor p” of largest 
exponent is precisely the factor of o( M) associated to p;. 


Let us write ElemDiv(M) to denote the multiset of all elementary divisors of 
M. Thus, if r € ElemDiv(/), then any associate of r is also in ElemDiv( M). 
We can now say that ElemDiv(M) is a complete invariant for isomorphism. 
Technically, the function M ++ ElemDiv( M) is the complete invariant, but this 
hair is not worth splitting. Also, we could work with a system of distinct 
representatives for the associate classes of the elementary divisors, but in 
general, there is no way to single out a special representative. 


Theorem 6.14 Let R be a principal ideal domain. The multiset ElemDiv( M) is 
a complete invariant for isomorphism of finitely generated torsion R-modules, 
that is, 


MxN & ElemDiv(M) = ElemDiv(V) o 
We have seen (Theorem 6.2) that if 
M=A6B 
then 
o(M) = Iem(o(A), 0(B)) 


Let us now compare the elementary divisors of M to those of A and B. 


Theorem 6.15 Let M be a finitely generated torsion module over a principal 
ideal domain and suppose that 


M=A6B 


1) The primary cyclic decomposition of M is the direct sum of the primary 
cyclic decompositons of A and B; that is, if 


A=Q(ai;) and B= @((bi3) 


are the primary cyclic decompositions of A and B, respectively, then 


M= (QBlais)) © (Beis) 


is the primary cyclic decomposition of M. 


156 Advanced Linear Algebra 


2) The elementary divisors of M are 
ElemDiv(/) = ElemDiv(A) U ElemDiv(B) 


where the union is a multiset union; that is, we keep all duplicate 
members. O 


The Invariant Factor Decomposition 


According to Theorem 6.4, if S and T are cyclic submodules with relatively 
prime orders, then S 6 T is a cyclic submodule whose order is the product of 
the orders of S and T. Accordingly, in the primary cyclic decomposition of M, 


M = | (011) D D (v1.4) D D Kun) OB Kne) 
Mp Mpn 
with elementary divisors p; “ satisfying 
ei = Ci,1 > C42 Zt > Ciki (6.3) 
we can combine cyclic summands with relatively prime orders. One judicious 


way to do this is to take the leftmost (highest-order) cyclic submodules from 
each group to get 


D: = (w11) ® +++ ® (vn) 
and repeat the process 


Dz = (v1,2)) ® +++ B (n2)) 
D; = (v1,3)) B +++ B Kuna) 


Of course, some summands may be missing here since different primary 
modules MW, do not necessarily have the same number of summands. In any 
case, the result of this regrouping and combining is a decomposition of the form 


M = Di9- D Dm 


which is called an invariant factor decomposition of M. 


For example, suppose that 
M = [(v11) © (v1,2))] © [(v2,1)] © Kos) D (v3.2) © Koss] 
Then the resulting regrouping and combining gives 


M = [Kv1,1)) ® (w21) ® (v3.1) | B Ku) © (v32)] © [(v3,3)) 
a, penne” eee SJ’ 
Dı Dy Ds 
As to the orders of the summands, referring to (6.3), if D; has order d;, then 


since the highest powers of each prime p; are taken for dı, the second-highest 
for də and so on, we conclude that 


Modules Over a Principal Ideal Domain 157 


din | dm=-1 | aigo | d2 | di (6.4) 
or equivalently, 
ann( Dı) C ann(D2) C-:- 


The numbers d; are called invariant factors of the decomposition. 


For instance, in the example above suppose that the elementary divisors are 
3. 2 3 3 
Pis Pi, P2, P3, P3, P3 
Then the invariant factors are 


dı = pi Pops 
2,3 

d2 = pi ps 

d3 = ps 
The process described above that passes from a sequence p” of elementary 
divisors in order (6.3) to a sequence of invariant factors in order (6.4) is 
reversible. The inverse process takes a sequence dı, ...,dm satisfying (6.4), 
factors each d; into a product of distinct nonassociate prime powers with the 
primes in the same order and then “peels off’ like prime powers from the left. 
(The reader may wish to try it on the example above.) 


This fact, together with Theorem 6.4, implies that primary cyclic 
decompositions and invariant factor decompositions are essentially equivalent. 
Therefore, since the multiset of elementary divisors of M is unique up to 
associate, the multiset of invariant factors of M is also unique up to associate. 
Furthermore, the multiset of invariant factors is a complete invariant for 
isomorphism. 


Theorem 6.16 (The invariant factor decomposition theorem) Let M be a 
finitely generated torsion module over a principal ideal domain R. Then 


M=D,8:::8Dn 
where D; is a cyclic submodule of M, with order d;, where 


dri | dmi | ES | dy | dı 


This decomposition is called an invariant factor decomposition of M and the 

scalars d; are called the invariant factors of M. 

1) The multiset of invariant factors is uniquely determined up to associate by 
the module M. 

2) The multiset of invariant factors is a complete invariant for isomorphism. O 


The annihilators of an invariant factor decomposition are called the invariant 
ideals of M. The chain of invariant ideals is unique, as is the chain of 


158 Advanced Linear Algebra 


annihilators in the primary cyclic decomposition. Note that dı is an order of M, 
that is, 


ann( M) = (dı) 
Note also that the product 
y= dı a ‘dm, 


of the invariant factors of M has some nice properties. For example, ~y is the 
product of all the elementary divisors of M. We will see in a later chapter that 
in the context of a linear operator T on a vector space, y is the characteristic 
polynomial of 7. 


Characterizing Cyclic Modules 


The primary cyclic decomposition can be used to characterize cyclic modules 
via their elementary divisors. 


Theorem 6.17 Let M be a finitely generated torsion module over a principal 
ideal domain, with order 


u= pi P "pp" 


The following are equivalent: 
1) M is cyclic. 
2) M is the direct sum 


M = (v) B B (vz) 


of primary cyclic submodules ((v;)) of order pi". 
3) The elementary divisors of M are precisely the prime power factors of u: 


ElemDiv(M) = {p%',..., p} 


Proof. Suppose that M is cyclic. Then the primary decomposition of M is a 
primary cyclic decomposition, since any submodule of a cyclic module is cyclic. 
Hence, 1) implies 2). Conversely, if 2) holds, then since the orders are relatively 
prime, Theorem 6.4 implies that M is cyclic. We leave the rest of the proof to 
the reader. 


Indecomposable Modules 


The primary cyclic decomposition of M is a decomposition of M into a direct 
sum of submodules that cannot be further decomposed. In fact, this 
characterizes the primary cyclic decomposition of M. Before justifying these 
statements, we make the following definition. 


Definition A module M is indecomposable if it cannot be written as a direct 
sum of proper submodules.O 


Modules Over a Principal Ideal Domain 159 


We leave proof of the following as an exercise. 


Theorem 6.18 Let M be a finitely generated torsion module over a principal 
ideal domain. The following are equivalent: 

1) M is indecomposable 

2) M is primary cyclic 

3) M has only one elementary divisor: 


ElemDiv(M) = {p°} oO 


Thus, the primary cyclic decomposition of M is a decomposition of M into a 
direct sum of indecomposable modules. Conversely, if 


M = A 9- D Am 


is a decomposition of M into a direct sum of indecomposable submodules, then 
each submodule A; is primary cyclic and so this is the primary cyclic 
decomposition of M. 


Indecomposable Submodules of Prime Order 


Readers acquainted with group theory know that any group of prime order is 
cyclic. However, as mentioned earlier, the order of a module corresponds to the 
smallest exponent of a group, not to the order of a group. Indeed, there are 
modules of prime order that are not cyclic. Nevertheless, cyclic modules of 
prime order are important. 


Indeed, if M is a finitely generated torsion module over a principal ideal 
domain, with order u, then each prime factor p of u gives rise to a cyclic 
submodule W of M whose order is p and so W is also indecomposable. 
Unfortunately, W need not be complemented and so we cannot use it to 
decompose M. Nevertheless, the theorem is still useful, as we will see in a later 
chapter. 


Theorem 6.19 Let M be a finitely generated torsion module over a principal 
ideal domain, with order u. If p is a prime divisor of u, then M has a cyclic 
(equivalently, indecomposable) submodule W of prime order p. 

Proof. If u = pq, then there is a v € M for which w = qu 40 but pw = 0. 
Then W = (w)) is annihilated by p and so o(w) |p. But p is prime and 
o(w) £1 and so o(w) = p. Since W has prime order, Theorem 6.18 implies 
that W is cyclic if and only if it is indecomposable.0 


Exercises 


1. Show that any free module over an integral domain is torsion-free. 
Let M be a finitely generated torsion module over a principal ideal domain. 
Prove that the following are equivalent: 
a) M is indecomposable 
b) M has only one elementary divisor (including multiplicity) 


160 


10. 


11. 


12. 


13. 


Advanced Linear Algebra 


c) M is cyclic of prime power order. 

Let R be a principal ideal domain and R* the field of quotients. Then R* is 
an R-module. Prove that any nonzero finitely generated submodule of R* 
is a free module of rank 1. 

Let R be a principal ideal domain. Let M be a finitely generated torsion- 
free R-module. Suppose that N is a submodule of M for which N is a free 
R-module of rank 1 and M/N is a torsion module. Prove that M is a free 
R-module of rank 1. 

Show that the primary cyclic decomposition of a torsion module over a 
principal ideal domain is not unique (even though the elementary divisors 
are). 

Show that if M is a finitely generated R-module where R is a principal 
ideal domain, then the free summand in the decomposition M = F ® Mior 
need not be unique. 

If (v)) is a cyclic R-module of order a show that the map 7: R — ((v)) 
defined by rr = rv is a surjective R-homomorphism with kernel (a) and so 


If R is an integral domain with the property that all submodules of cyclic 

R-modules are cyclic, show that R is a principal ideal domain. 

Suppose that F is a finite field and let F* be the set of all nonzero elements 

of F. 

a) Show that if p(x) € F[x] is a nonconstant polynomial over F and if 
r € F is a root of p(x), then x — r is a factor of p(x). 

b) Prove that a nonconstant polynomial p(x) € F[x] of degree n can have 
at most n distinct roots in F. 

c) Use the invariant factor or primary cyclic decomposition of a finite Z- 
module to prove that F* is cyclic. 

Let R be a principal ideal domain. Let M = ((v)) be a cyclic R-module 

with order a. We have seen that any submodule of M is cyclic. Prove that 

for each 3 € R such that £ | a there is a unique submodule of M of order 

b. 

Suppose that M is a free module of finite rank over a principal ideal 

domain R. Let N be a submodule of M. If M/N is torsion, prove that 

rk(N) = rk(M). 

Let F|x] be the ring of polynomials over a field F and let F'[x] be the ring 

of all polynomials in F'[x] that have coefficient of x equal to 0. Then F'[zx] 

is an F'[x]-module. Show that F'[z] is finitely generated and torsion-free 

but not free. Is F’ [2] a principal ideal domain? 

Show that the rational numbers Q form a torsion-free Z-module that is not 

free. 


More on Complemented Submodules 


14. 


Let R be a principal ideal domain and let M be a free R-module. 


15. 


16. 


17. 


18. 


Modules Over a Principal Ideal Domain 161 


a) Prove that a submodule N of M is complemented if and only if M/N 
is free. 

b) IfM is also finitely generated, prove that N is complemented if and 
only if M/N is torsion-free. 

Let M be a free module of finite rank over a principal ideal domain R. 

a) Prove that if N is a complemented submodule of M, then 
rk(N) = rk(M) if and only if N = M. 

b) Show that this need not hold if N is not complemented. 

c) Prove that N is complemented if and only if any basis for N can be 
extended to a basis for M. 

Let M and N be free modules of finite rank over a principal ideal domain 

R. Let r: M — N be an R-homomorphism. 

a) Prove that ker(7T) is complemented. 

b) What about im(7)? 

c) Prove that 


M 
rk(M) = rk(ker(T)) + rk(im(7)) = rk(ker(7)) + rk (| —— 
ker(7) 
d) If 7 is surjective, then 7 is an isomorphism if and only if 
tk(M) = rk(N). 
e) If Lis a submodule of M and if M/L is free, then 


rk ia = rk(M) — rk(LD 
(E) = shea -lE 


A submodule N of a module M is said to be pure in M if whenever 

vé M \ N, then rv ¢ N for all nonzero r € R. 

a) Show that N is pure if and only ifv € N and v = rw for r € R implies 
wEN. 

b) Show that N is pure if and only if M/N is torsion-free. 

c) If R is a principal ideal domain and M is finitely generated, prove that 
N is pure if and only if M/N is free. 

d) If Zand N are pure submodules of M, then so are LM N and LUN. 
What about L + N? 

e) If N is pure in M, then show that LA N is pure in L for any 
submodule L of M. 

Let M be a free module of finite rank over a principal ideal domain R. Let 

L and N be submodules of M with L complemented in M. Prove that 


rk(L + N) +rk(LN N) =rk(L) + rk(N) 


Chapter 7 
The Structure of a Linear Operator 


In this chapter, we study the structure of a linear operator on a finite- 
dimensional vector space, using the powerful module decomposition theorems 
of the previous chapter. Unless otherwise noted, all vector spaces will be 
assumed to be finite-dimensional. 


Let V be a finite-dimensional vector space. Let us recall two earler theorems 
(Theorem 2.19 and Theorem 2.20). 


Theorem 7.1 Let V be a vector space of dimension n. 

1) Two nxn matrices A and B are similar (written A ~ B) if and only if 
they represent the same linear operator T € L(V), but possibly with 
respect to different ordered bases. In this case, the matrices A and B 
represent exactly the same set of linear operators in L(V). 

2) Then two linear operators T and a on V are similar (written T ~ o) if and 
only if there is a matrix A E M, that represents both operators, but with 
respect to possibly different ordered bases. In this case, T and o are 
represented by exactly the same set of matrices in M,.0 


Theorem 7.1 implies that the matrices that represent a given linear operator are 
precisely the matrices that lie in one similarity class. Hence, in order to uniquely 
represent all linear operators on V, we would like to find a set consisting of one 
simple representative of each similarity class, that is, a set of simple canonical 
forms for similarity. 


One of the simplest types of matrix is the diagonal matrix. However, these are 
too simple, since some operators cannot be represented by a diagonal matrix. A 
less simple type of matrix is the upper triangular matrix. However, these are not 
simple enough: Every operator (over an algebraically closed field) can be 
represented by an upper triangular matrix but some operators can be represented 
by more than one upper triangular matrix. 


164 Advanced Linear Algebra 


This gives rise to two different directions for further study. First, we can search 
for a characterization of those linear operators that can be represented by 
diagonal matrices. Such operators are called diagonalizable. Second, we can 
search for a different type of “simple” matrix that does provide a set of 
canonical forms for similarity. We will pursue both of these directions. 


The Module Associated with a Linear Operator 
If r € L(V), we will think of V not only as a vector space over a field F but 


also as a module over F'[z], with scalar multiplication defined by 
p(z)v = p(T) (v) 


We will write V, to indicate the dependence on r. Thus, V- and V, are modules 
with the same ring of scalars F'[x], although with different scalar multiplication 
ift fo. 


Our plan is to interpret the concepts of the previous chapter for the module V,. 
First, if dim(V) = n, then dim(L(V)) = n?. This implies that V, is a torsion 
module. In fact, the n? + 1 vectors 


are linearly dependent in L(V), which implies that p(T) = 0 for some nonzero 
polynomial p(x) € Fa]. Hence, p(x) € ann(V,) and so ann(V-) is a nonzero 
principal ideal of F |z]. 


Also, since V is finitely generated as a vector space, it is, a fortiori, finitely 
generated as an F|x]-module. Thus, V is a finitely generated torsion module 
over a principal ideal domain F'[x] and so we may apply the decomposition 
theorems of the previous chapter. In the first part of this chapter, we embark on 
a “translation project” to translate the powerful results of the previous chapter 
into the language of the modules V,. 


Let us first characterize when two modules V, and V, are isomorphic. 
Theorem 7.2 [f 7,0 € L(V), then 
VV © Tro 


In particular, ¢: V- — V, is a module isomorphism if and only if @ is a vector 
space automorphism of V satisfying 


o= oro! 
Proof. Suppose that ¢: V, —> V, is a module isomorphism. Then for v € V, 
p(zv) = «(ov) 


which is equivalent to 


The Structure of a Linear Operator 165 


o(rv) = o(9v) 
and since ¢ is bijective, this is equivalent to 
(oro ')v = ov 
that is, co = ¢7¢ +. Since a module isomorphism from V, to V, is a vector space 


isomorphism as well, the result follows. 


For the converse, suppose that @ is a vector space automorphism of V and 
ao = oro“, that is, dr = od. Then 


latv) = o(r*v) = o" (v) = x*(dv) 
and the F-linearity of ¢ implies that for any polynomial p(x) € F|x], 
o(p(T)v) = pla)ouv 
Hence, ¢ is a module isomorphism from V, to V,.0 
Submodules and Invariant Subspaces 


There is a simple connection between the submodules of the F'[x]-module V, 
and the subspaces of the vector space V. Recall that a subspace S of V is 7T- 
invariant if 7S C S. 


Theorem 7.3 A subset S C V is a submodule of V, if and only if S is a T- 
invariant subspace of V.O 
Orders and the Minimal Polynomial 
We have seen that the annihilator of V,, 
ann(V;) = {p(a) € Fla] | p(@)V, = {OF} 
is a nonzero principal ideal of F'[a], say 
ann(V,) = (m(x)) 


Since the elements of the base ring F[x] of V, are polynomials, for the first time 
in our study of modules there is a logical choice among all scalars in a given 
associate class: Each associate class contains exactly one monic polynomial. 


Definition Let r € L(V). The unique monic order of V, is called the minimal 
polynomial for 7 and is denoted by m,(x) or min(rT). Thus, 


ann(V,) = (m-(z)) o 


In treatments of linear algebra that do not emphasize the role of the module V,, 
the minimal polynomial of a linear operator 7 is simply defined as the unique 


166 Advanced Linear Algebra 


monic polynomial m,(x) of smallest degree for which m,(r) =0. This 
definition is equivalent to our definition. 


The concept of minimal polynomial is also defined for matrices. The minimal 
polynomial ma, (a) of matrix A E€ M,,(F’) is defined as the minimal polynomial 
of the multiplication operator 74. Equivalently, m (a) is the unique monic 
polynomial p(x) € F[x] of smallest degree for which p(A) = 0. 


Theorem 7.4 

1) Ift ~ o are similar linear operators on V, then m, (x£) = m,(x). Thus, the 
minimal polynomial is an invariant under similarity of operators. 

2) If A~ B are similar matrices, then m(x) = mp(a). Thus, the minimal 
polynomial is an invariant under similarity of matrices. 

3) The minimal polynomial of 7 € L(V) is the same as the minimal 
polynomial of any matrix that represents T.O 


Cyclic Submodules and Cyclic Subspaces 
Let us now look at the cyclic submodules of V,: 
(v) = Flelv = {p(7)(v) | pæ) € Flalt 


which are 7-invariant subspaces of V. Let m(x) be the minimal polynomial of 
T|()) and suppose that deg(m(a)) = n. If p(x)v € ((v)), then writing 


p(x) = q(x)m(a) + r(x) 
where deg r(x) < deg m(x) gives 
p(x)v = [q(a)m(x) + r(x)jv = r(a)v 
and so 
(v) = {r(x)v | deg r(x) < n} 
Hence, the set 


n—1 


B= {v, av, ...,2 v} = {v,Tv,...,7% tv} 


spans the vector space ((v)). To see that B is a basis for ((v)), note that any linear 
combination of the vectors in B has the form r(x)v for deg(r(a)) < n and so is 
equal to 0 if and only if r(x) = 0. Thus, B is an ordered basis for Kv). 


Definition Let r E€ L(V). A T-invariant subspace S of V is T-cyclic if S has a 
basis of the form 


B = {v, Tv, ..., Ttw} 


for some v € V and n > 0. The basis B is called a T-cyclic basis for V.O 


The Structure of a Linear Operator 167 


Thus, a cyclic submodule ((v)) of V, with order m(x) of degree n is a T-cyclic 
subspace of V of dimension n. The converse is also true, for if 


B={ov,7Tv,...,7 !v} 
is a basis for a T-invariant subspace S of V, then S is a submodule of V,. 


Moreover, the minimal polynomial of r|5 has degree n, since if 


Tv = —agv — a TU — +++ — Ant” lu 


then 7|s satisfies the polynomial 
m(x) = ag + aye +--+ + anar"! +2" 
but none of smaller degree since B is linearly independent. 
Theorem 7.5 Let V be a finite-dimenional vector space and let S C V. The 
following are equivalent: 
1) Sis acyclic submodule of V, with order m(x) of degree n 
2) Sis a T-cyclic subspace of V of dimension n.O 
We will have more to say about cyclic modules a bit later in the chapter. 
Summary 


The following table summarizes the connection between the module concepts 
and the vector space concepts that we have discussed so far. 


F'|x|-Module V, F-Vector Space V 
Scalar multiplication: p(x)v Action of p(T): p(T) (v) 
Submodule of V, 7-Invariant subspace of V 
Annihilator: Annihilator: 
ann(V;) = {p(x) | p(x) Vr = {0}} ann(V) = {p(x) | p(r)(V) = {0}} 
Monic order m(x) of V»: Minimal polynomial of 7: 
ann(V-) = (m(x)) m(x) has smallest deg with m(r) = 0 
Cyclic submodule of Vz: T-cyclic subspace of V: 
(v) = {p(x)v | deg p(x) < degm(x)} | (v, Tv, ..., 7 (v)}, m = deg(p(x)) 


The Primary Cyclic Decomposition of V- 


We are now ready to translate the cyclic decomposition theorem into the 
language of V}. 


Definition Let 7 € L(V). 

1) The elementary divisors and invariant factors of 7 are the monic 
elementary divisors and invariant factors, respectively, of the module V,. 
We denote the multiset of elementary divisors of T by ElemDiv(r) and the 
multiset of invariant factors of T by InvFact(rT). 


168 Advanced Linear Algebra 


2) The elementary divisors and invariant factors of a matrix A are the 
elementary divisors and invariant factors, respectively, of the multiplication 
operator TA: 


ElemDiv(A) = ElemDiv(t4) and InvFact(A) = InvFact(74) O 


We emphasize that the elementary divisors and invariant factors of an operator 
or matrix are monic by definition. Thus, we no longer need to worry about 
uniqueness up to associate. 


Theorem 7.6 (The primary cyclic decomposition theorem for V) Let V be 
finite-dimensional and let r € L(V) have minimal polynomial 
m,(x) = pi (x) py (2) 


where the polynomials p;(a) are distinct monic primes. 
1) (Primary decomposition) The F'(x]-module V, is the direct sum 


V; = Vp 8+ Vp, 
where 


m(x) 
p; (2) 


Va = V= {vE V pe) =O} 


is a primary submodule of V, of order p;' (x). In vector space terms, Vp, is a 
T-invariant subspace of V and the minimal polynomial of Tly,, is 


min(T 


v,) = pi (z) 


2) (Cyclic decomposition) Each primary summand V,, can be decomposed 
into a direct sum 
Vp, = (via) B+ ® (vie) 
of T-cyclic submodules (lv; j} of order p;” (x) with 


ei = Ci = C12 2 +++ = Ciki 


In vector space terms, ((v;,;)) is a T-cyclic subspace of V, and the minimal 
polynomial of T\ (x,,,) is 


min (T |en; y) = p” (x) 


3) (The complete decomposition) This yields the decomposition of V, into a 
direct sum of T-cyclic subspaces 


V, = (K011) B B KaD) B B Kun B B Unk) 


4) (Elementary divisors and dimensions) The multiset of elementary divisors 
{p;" (x)} is uniquely determined by r. If deg(p;"(x)) = dij, then the T- 


The Structure of a Linear Operator 169 


cyclic subspace ((v;,;)) has T-cyclic basis 
Bij = (vijs Tijs ++ TO Vii) 


and dim(((v;,;))) = deg(p,’). Hence, 

ki 

dim(V;,) = $ ` deg(p;”) 

j=l 

We will call the basis 
R= Be 

Lj 

for V the elementary divisor basis for V,.O 


Recall that if V = A @ B and if both A and B are r-invariant subspaces of V, 
the pair (A, B) is said to reduce 7. In module language, the pair (A, B) reduces 
7 if A and B are submodules of V, and 


V, = A, OB, 


We can now translate Theorem 6.15 into the current context. 


Theorem 7.7 Let r € L(V) and let 
V, = A OB, 
1) The minimal polynomial of T is 
m(x) = lem(my, (£), Mm, (£)) 


2) The primary cyclic decomposition of V, is the direct sum of the primary 
cyclic decompositons of A, and B,; that is, if 


A, = Qlaz) and B, = Qk) 


are the primary cyclic decompositions of A, and B,, respectively, then 


Ve = (Blain) e (Pe) 


is the primary cyclic decomposition of V,. 
3) The elementary divisors of T are 


ElemDiv(r) = ElemDiv(r| 4) U ElemDiv(r7]| 3p) 


where the union is a multiset union; that is, we keep all duplicate 
members. O 


170 Advanced Linear Algebra 


The Characteristic Polynomial 


To continue our translation project, we need a definition. Recall that in the 
characterization of cyclic modules in Theorem 6.17, we made reference to the 
product of the elementary divisors, one from each associate class. Now that we 
have singled out a special representative from each associate class, we can make 
a useful definition. 


Definition Let r € L(V). The characteristic polynomial c,(x) of T is the 
product of all of the elementary divisors of T: 


e(z) = [I ve) 


Hence, 
deg(c,(x)) = dim(V) 
Similarly, the characteristic polynomial cy (x) of a matrix M is the product of 


the elementary divisors of M.O 


The following theorem describes the relationship between the minimal and 
characteristic polynomials. 


Theorem 7.8 Let r € L(V). 
1) (The Cayley—Hamilton theorem) The minimal polynomial of T divides the 
characteristic polynomial of T: 


m(x) | ¢,() 
Equivalently, T satisfies its own characteristic polynomial, that is, 
e(r) =0 


2) The minimal polynomial 


e€11 En,1 


mr(z) = pi” (2) pn" (2) 
and characteristic polynomial 


s= oi) 


of T have the same set of prime factors p;(a) and hence the same set of 
roots (not counting multiplicity). 0 


We have seen that the multiset of elementary divisors forms a complete 
invariant for similarity. The reader should construct an example to show that the 
pair (m, (x), c-(x)) is not a complete invariant for similarity, that is, this pair of 


The Structure of a Linear Operator 171 


polynomials does not uniquely determine the multiset of elementary divisors of 
the operator T. 


In general, the minimal polynomial of a linear operator is hard to find. One of 
the virtues of the characteristic polynomial is that it is comparatively easy to 
find and we will discuss this in detail a bit later in the chapter. 


Note that since m,(a) | c-(a) and both polynomials are monic, it follows that 
m,(x) = c-(%) <<  deg(m,(a)) = deg(c;(x)) 


Definition A linear operator T € L(V) is nonderogatory if its minimal 
polynomial is equal to its characteristic polynomial: 


m(x) = c (z) 
or equivalently, if 
deg(m-(x)) = deg(c-(x)) 
or if 
deg(m,(x)) = dim(V) 
Similar statements hold for matrices. O 
Cyclic and Indecomposable Modules 


We have seen (Theorem 6.17) that cyclic submodules can be characterized by 
their elementary divisors. Let us translate this theorem into the language of V- 
(and add one more equivalence related to the characteristic polynomial). 


Theorem 7.9 Let T € L(V) have minimal polynomial 
mr(x) = pi (£) -p (x) 


where pi(x) are distinct monic primes. The following are equivalent: 
1) V, is cyclic. 
2) V, is the direct sum 


V- = (u) © + a (up) 


of T-cyclic submodules ((v;)) of order p;'(x). 
3) The elementary divisors of T are 


ElemDiv(r) = {p7 (x), ---, Dn" (x) } 
4) Tis nonderogatory, that is, 


m(x) = c (x) O 


172 Advanced Linear Algebra 


Indecomposable Modules 


We have also seen (Theorem 6.19) that, in the language of V,, each prime factor 
p(x) of the minimal polynomial m-() gives rise to a cyclic submodule W of V 
of prime order p(x). 


Theorem 7.10 Let T € L(V) and let p(x) be a prime factor of m,(x). Then V; 
has a cyclic submodule W, of prime order p(x).O 


For a module of prime order, we have the following. 


Theorem 7.11 For a module W, of prime order m,(x), the following are 
equivalent: 

1) W, is cyclic 

2) W; is indecomposable 

3) c,(«) is irreducible 

4) 7 is nonderogatory, that is, c (x) = m,(x) 

5) dim(W,) = deg(p(x)).0 


Our translation project is now complete and we can begin to look at issues that 
are specific to the modules V,. 
Companion Matrices 


We can also characterize the cyclic modules V, via the matrix representations of 
the operator T, which is obviously something that we could not do for arbitrary 
modules. Let V, = ((v)) be a cyclic module, with order 


mrz) = ao + az +- + Gna"! +r” 
and ordered 7-cyclic basis 
B= (v, TU, +. ay 


Then 


for0 < i < n — 2 and 


T(T™ tv) =7"v 


-1 
= —(ao + QT + +++ + nT” )v 
nl, 


= —aguv a,TU oes QAn—1T 


and so 


The Structure of a Linear Operator 173 


0 Os 0) gy 
1 0 sare, 0 —ay, 
[T]p=]0 i : 
: : a 0 —An-2 
0 0 ::: 1 —An-1 


This matrix is known as the companion matrix for the polynomial m, (x). 
Definition The companion matrix of a monic polvomial 
p(x) = ag + arx + +++ + anr" + 2" 


is the matrix 


o Hiss © =ü 
1 0- 0 = 
Clp) = 0 1 `, : Oo 
: k i 0 —an-2 
0 0 +: 1 —an-ı 


Note that companion matrices are defined only for monic polynomials. 
Companion matrices are nonderogatory. Also, companion matrices are precisely 
the matrices that represent operators on 7-cyclic subspaces. 


Theorem 7.12 Let p(x) € F [2]. 
I) A companion matrix A = C |p(x)] is nonderogatory; in fact, 
ca(x) = ma(x) = p(x) 


2) V, is cyclic if and only if T can be represented by a companion matrix, in 
which case the representing basis is T-cyclic. 

Proof. For part 1), let E = (e1,..., €n) be the standard basis for F”. Since 

ei = Ale] for i > 2, it follows that for any polynomial f(x), 


f(A)=0 & f(A)e;=Oforalli = f(A)er =0 
If p(x) = ao + ax + -+ + aya"! + x”, then 
n-1 / n-1 n-1 
p(A)ey = 5 a;A’'e1 + A”e, = 5 Ajej41 — 5 aein1 = 0 
i=0 i=0 i=0 
and so p(A)e, = 0, whence p(A) = 0. Also, if 
q(x) = bo + bye +- + bmi! + bmx” 
is nonzero and has degree m < n, then 


q(A)er = boei F biez EK bm—1€m = Dm€m+1 x 0 


174 Advanced Linear Algebra 


since E is linearly independent. Hence, p(x) has smallest degree among all 
polynomials satisfied by A and so p(x) = ma(z). Finally, 


deg(ma(x)) = deg(p(x)) = deg(ca()) 


For part 2), we have already proved that if V, is cyclic with r-cyclic basis B, 
then [7], = C[p(a)]. For the converse, if [7], = C[p(«)], then part 1) implies 
that 7 is nonderogatory. Hence, Theorem 7.11 implies that V, is cyclic. It is 
clear from the form of C'[p(«)] that B is a r-cyclic basis for V.O 


The Big Picture 


Ifo,7 € L(V), then Theorem 7.2 and the fact that the elementary divisors form 
a complete invariant for isomorphism imply that 


ont & VV, & ElemDiv(r) = ElemDiv(c) 


Hence, the multiset of elementary divisors is a complete invariant for similarity 
of operators. Of course, the same is true for matrices: 


AvrB & FUXFR & ElemDiv(A) = ElemDiv(B) 
where we write F} in place of F}. 
The connection between the elementary divisors of an operator 7 and the 
elementary divisors of the matrix representations of T is described as follows. If 


A = [r|p, then the coordinate map ¢g:V ~ F” is also a module isomorphism 
op: Vr — F}. Specifically, we have 


pg(p(T)v) = [p(7)ule = prle) [ule = P(A) oa(v) 
and so ġg preserves F'[x]-scalar multiplication. Hence, 
A = [r]g for some B8 > V aF] 


For the converse, suppose that o: V, ~ F}. If we define b; € V by obi = e;, 
where e; is the ith standard basis vector, then B = (b1,...,b,) is an ordered 
basis for V and o = ¢g is the coordinate map for 6. Hence, dg is a module 
isomorphism and so 


bp(Tv) = TA($Bv) 
for all v € V, that is, 
[ro]s = Ta ([v]s) 


which shows that A = [r]g. 


Theorem 7.13 Let V be a finite-dimensional vector space over F. Let 
o,7T € L(V) and let A, B € M,,(F). 


The Structure of a Linear Operator 175 


1) The multiset of elementary divisors (or invariant factors) is a complete 
invariant for similarity of operators, that is, 
on TeV V- 
© ElemDiv(r) = ElemDiv(c) 
< InvFact(r) = InvFact(c) 


A similar statement holds for matrices: 
An BSF x FR 
< ElemDiv(A) = ElemDiv(B) 
< InvFact(A) = InvFact(B) 


2) The connection between operators and their representing matrices is 


A = [r]g for some B & V, ~ FY 
=> ElemDiv(r) = ElemDiv(A) 
<> InvFact(r) = InvFact(A) o 


Theorem 7.13 can be summarized in Figure 7.1, which shows the big picture. 


similarity classes 
=| = of L(V) 
A 
Vv 
V V isomorphism classes 
f 7 of F[x]-modules 
A 
Vv 
Multisets of 
{ED,}|{ED,} elementary divisors 
A 
y 
ltl |[ols Similarity classes 
[the! [cle of matrices 


Figure 7.1 


Figure 7.1 shows that the similarity classes of L(V) are in one-to-one 
correspondence with the isomorphism classes of F |x]-modules V, and that these 
are in one-to-one correspondence with the multisets of elementary divisors, 
which, in turn, are in one-to-one correspondence with the similarity classes of 
matrices. 


We will see shortly that any multiset of prime power polynomials is the multiset 
of elementary divisors for some operator (or matrix) and so the third family in 


176 Advanced Linear Algebra 


the figure could be replaced by the family of all multisets of prime power 
polynomials. 
The Rational Canonical Form 


We are now ready to determine a set of canonical forms for similarity. Let 
T € L(V). The elementary divisor basis R for V, that gives the primary cyclic 
decomposition of V,, 


Ve = ((v11) ++ D (v1.41) D ++ D (ona) @ ++ B Knk) 
is the union of the bases 
Biy = (UT ies, Thi ly, 5) 
and so the matrix of 7 with respect to R is the block diagonal matrix 
[tle = diag(C[p}'"(a)],...,Clpy (2)], ---,C[pn" (2)], ---, Cpr (E) 


with companion matrices on the block diagonal. This matrix has the following 
form. 


Definition 4 matrix A is in the elementary divisor form of rational canonical 
form if 


A= diag (C{r{!(«)], L.. „Clr (2)]) 
where the r;(x) are monic prime polynomials. O 


Thus, as shown in Figure 7.1, each similarity class S contains at least one matrix 
in the elementary divisor form of rational canonical form. 


On the other hand, suppose that M is a rational canonical matrix 
M = diag(C[q{""(2)],---,Clay'"(#)],---, Clan" eha Clam" (a)]) 


of size d x d. Then M represents the matrix multiplication operator Ty under 
the standard basis € on F“. The basis € can be partitioned into blocks £; x 
corresponding to the position of each of the companion matrices on the block 
diagonal of M. Since 
fik 
lTm lenen = Clg” (2)] 

it follows from Theorem 7.12 that each subspace (E; x} is Tm-cyclic with monic 
fik 


order q; (x) and so Theorem 7.9 implies that the multiset of elementary 


divisors of Tm is {q;" (x) }. 


This shows two important things. First, any multiset of prime power 
polynomials is the multiset of elementary divisors for some matrix. Second, M 


The Structure of a Linear Operator 177 


lies in the similarity class that is associated with the elementary divisors 
{q;"(x)}. Hence, two matrices in the elementary divisor form of rational 
canonical form lie in the same similarity class if and only if they have the same 
multiset of elementary divisors. In other words, the elementary divisor form of 
rational canonical form is a set of canonical forms for similarity, up to order of 
blocks on the block diagonal. 


Theorem 7.14 (The rational canonical form: elementary divisor version) Let 
V be a finite-dimensional vector space and let t € L(V) have minimal 
polynomial 


m(x) = py (2) pe (2) 


where the p;(x)'s are distinct monic prime polynomials. 
1) If R is an elementary divisor basis for V,, then |r]r is in the elementary 
divisor form of rational canonical form: 


[rhe = diag( Clp? (x)], CIPE” (a)],--- Cle" (@)],--- CREN) 


where p,” (x) are the elementary divisors of T. This block diagonal matrix 
is called an elementary divisor version of a rational canonical form of T. 

2) Each similarity class S of matrices contains a matrix R in the elementary 
divisor form of rational canonical form. Moreover, the set of matrices in S 
that have this form is the set of matrices obtained from M by reordering the 
block diagonal matrices. Any such matrix is called an elementary divisor 
verison of a rational canonical form of A. 

3) The dimension of V is the sum of the degrees of the elementary divisors of 
T, that is, 


n kj 


dim(V) = $`) deg(p;"’) Oo 


i=l j= 


Example 7.1 Let 7 be a linear operator on the vector space R” and suppose that 
T has minimal polynomial 


m(x) = (z — 1)(z? +1) 


Noting that x — 1 and (x? + 1)? are elementary divisors and that the sum of the 
degrees of all elementary divisors must equal 7, we have two possibilities: 


1) 2-1, (x? +1)?, 2?+1 
2) x-1,2-1, e—-1, (z? +1)? 


These correspond to the following rational canonical forms: 


178 Advanced Linear Algebra 


1000 00 0 
0 0 0 0 -1 0 0 
010 0 00 0 
1) 00 10-20 0 
0 0 0 1 00 0 
000 0 00 -=l 
000 0 01 0 
100 0 0 0 0 
0 10 00 0 0 
0010 0 0 0 
2) 0 0 0 0 0 0 -!1 O 
000 10 0 0 
0 00 01 0 -2 
000 0 0 1 0 


The rational canonical form may be far from the ideal of simplicity that we had 
in mind for a set of simple canonical forms. Indeed, the rational canonical form 
can be important as a theoretical tool, more so than a practical one. 


The Invariant Factor Version 


There is also an invariant factor version of the rational canonical form. We 
begin with the following simple result. 


Theorem 7.15 Jf p(x), g(a) € F[|x] are relatively prime polynomials, then 


~ (Cle) 0 
C[p(x)q(a)] ( 0 C142) es 


Proof. Speaking in general terms, if an mxm matrix A has minimal 
polynomial 


m(x) = pi (2) -pr (2) 
of degree equal to the size m of the matrix, then Theorem 7.14 implies that the 
elementary divisors of A are precisely 
P(X), «++, Dy (2) 


Since the matrices C[p(x)q(x)] and diag(C[p(a)], C[q(x)]) have the same size 
m x m and the same minimal polynomial p(2)q(x) of degree m, it follows that 
they have the same multiset of elementary divisors and so are similar. 


Definition A matrix A is in the invariant factor form of rational canonical 
form if 


The Structure of a Linear Operator 179 


A = diag (C{s1(2)],-..,C[sn(e)]) 
where sk} (x) | 54(x) fork = 1,...,n — 1.0 


Theorem 7.15 can be used to rearrange and combine the companion matrices in 
an elementary divisor version of a rational canonical form R to produce an 
invariant factor version of rational canonical form that is similar to R. Also, this 
process is reversible. 


Theorem 7.16 (The rational canonical form: invariant factor version) Let 
dim(V) < œ and suppose that T € L(V) has minimal polynomial 


m(x) = pi (z) pp (x) 


where the monic polynomials p;(x) are distinct prime (irreducible) polynomials 
1) V has an invariant factor basis b, that is, a basis for which 


Ir]e = diag(C[si(2)], m C{sn(c)]) 


where the polynomials s(x”) are the invariant factors of T and 
Srai(@) | s(x). This block diagonal matrix is called an invariant factor 
version of a rational canonical form of 7. 

2) Each similarity class S of matrices contains a matrix R in the invariant 
factor form of rational canonical form. Moreover, the set of matrices in S 
that have this form is the set of matrices obtained from M by reordering the 
block diagonal matrices. Any such matrix is called an invariant factor 
verison of a rational canonical form of A. 

3) The dimension of V is the sum of the degrees of the invariant factors of T, 
that is, 


dim(V) = X deg(s:) O 


The Determinant Form of the Characteristic Polynomial 


In general, the minimal polynomial of an operator 7 is hard to find. One of the 
virtues of the characteristic polynomial is that it is comparatively easy to find. 
This also provides a nice example of the theoretical value of the rational 
canonical form. 


Let us first take the case of a companion matrix. If A = C[p,(x)] is the 
companion matrix of a monic polynomial 


š — n-1 n 
Pn(T; Ap, +++) An—1) = ao + aiT + +++ + Gye" +2 


then how can we recover p(x) = c4(x) from C[p(x)] by arithmetic operations? 


180 Advanced Linear Algebra 


When n = 2, we can write p2(x) as 
p(x; 49,1) = ao + a£ + x° = g(x + a1) + ao 


which looks suspiciously like a determinant: 
-1 z+ ay 


= det( 01 fi =a) 
1 —aı 


= det(aI — C[p2(x)}) 


p2(X; a9, a1) = a| i o | 


So, let us define 


A(z; a0, ... ,an-1) = zI — C [pr (£)] 


T 0 0 ao 
—1 x 0 ay 
=| 0. 4 
: : x x An—2 
0 O > —1 a+ayn_4 


where x is an independent variable. The determinant of this matrix is a 
polynomial in x whose degree equals the number of parameters ao,..., @n—1. 
We have just seen that 


det(A(a; ao, a1)) = p2(x; ao, a1) 
and this is also true for n = 1. As a basis for induction, if 
det(A(a; a0, ---,@n—1)) = Pu(@} Go, ---, äni) 
then expanding along the first row gives 


det(A(x, ao, ..., @n)) 


-1 rxz 0 
n 0 -l 
= xdet(A(z, a1,...,@n)) + (—1)"ao det : . x 
0 0 =j 
nxn 
= x det( A(x, Alyse, Gn)) + ao 
= © Py (x; ai, wa , Qn) + ag 
= Qt + ar? tee tana" + ntl +a 
= Phy (£; a0, ...-, An) 


We have proved the following. 


The Structure of a Linear Operator 181 


Lemma 7.17 For any p(x) € Fa, 
det(xI — C[p(x)]) = p(x) O 
Now suppose that R is a matrix in the elementary divisor form of rational 


canonical form. Since the determinant of a block diagonal matrix is the product 
of the determinants of the blocks on the diagonal, it follows that 


det(xI — R) =l p” (x (x) 


Moreover, if A ~ R, say A = PRP™', then 
det(xI — A) = det(xI — PRP~') 
= det [P(aI — R)P™'| 
= det(P)det(xI — R)det(P~*) 
= det(xI — R) 
and so 
det(xI — A) = det(al — R) = cr(x) = ca(a) 


Hence, the fact that all matrices have a rational canonical form allows us to 
deduce the following theorem. 


Theorem 7.18 Let r € L(V). If A is any matrix that represents T, then 
c& (x) = c4(a) = det(aI — A) O 


Changing the Base Field 


A change in the base field will generally change the primeness of polynomials 
and therefore has an effect on the multiset of elementary divisors. It is perhaps a 
surprising fact that a change of base field has no effect on the invariant factors— 
hence the adjective invariant. 


Theorem 7.19 Let F and K be fields with F C K. Suppose that the elementary 
divisors of a matrix A E€ M, (F) are 


€11 €l,ki En,1 En,kn } 


A= {p, peer Py petty Dn peeey Dn 


Suppose also that the polynomials p; can be further factored over K, say 


dia dim; 
Pi = Qi Ai mi 


where a; j is prime over K. Then the prime powers 


B D di1611 di my C11 dn v€nken dy smn En, kn, } 
= ayy gece Ql m ELTETE 2 An ; 3 An, Mn 


are the elementary divisors of A over K. 


182 Advanced Linear Algebra 


Ci 


Proof. Consider the companion matrix C [p;" (x)] in the rational canonical form 


of A over F. This is a matrix over K as well and Theorem 7.15 implies that 


i,j g i1êij dim; i,j 
Clp;'(a)| ~ diag(C last, ..., Clai) 


1,M; 


Hence, 5 is an elementary divisor basis for A over K.O 


As mentioned, unlike the elementary divisors, the invariant factors are field 
independent. This is equivalent to saying that the invariant factors of a matrix 
A € M,,(F) are polynomials over the smallest subfield of F that contains the 
entries of A. 


Theorem 7.20 Let A E€ M,,(F) and let E C F be the smallest subfield of F 

that contains the entries of A. 

1) The invariant factors of A are polynomials over E. 

2) Two matrices A,B € M,,(F) are similar over F if and only if they are 
similar over E. 

Proof. Part 1) follows immediately from Theorem 7.19, since using either A or 

B to compute invariant factors gives the same result. Part 2) follows from the 

fact that two matrices are similar over a given field if and only if they have the 

same multiset of invariant factors over that field.O 


Example 7.2 Over the real field, the matrix 
0 -1 
4= (1%) 
is the companion matrix for the polynomial x” + 1, and so 


ElemDivg(A) = {x? + 1} = InvFactg(A) 


However, as a complex matrix, the rational canonical form for A is 


and so 


ElemDive(A) = {x —i,a +i} and InvFacte(A) = {z +1} O 


Exercises 


1. We have seen that any 7 € L(V) can be used to make V into an F[z]- 
module. Does every module V over Fa] come from some r € L(V)? 
Explain. 

2. Letr € L(V) have minimal polynomial 


m,(x) = p (£) -p (x) 


10. 


11. 


12. 


13. 


The Structure of a Linear Operator 183 


where p;(a) are distinct monic primes. Prove that the following are 
equivalent: 

a) V, 1s T-cyclic. 

b) deg(m,(a)) = dim(V). 

c) The elementary divisors of 7 are the prime power factors p‘' (a) and so 


V, = (01) ® +++ & (ug) 


is a direct sum of r-cyclic submodules ((v;)) of order p’ (x). 
Prove that a matrix A E€ M,,(F’) is nonderogatory if and only if it is similar 
to a companion matrix. 
Show that if A and B are block diagonal matrices with the same blocks, but 
in possibly different order, then A and B are similar. 
Let A E€ M, (F). Justify the statement that the entries of any invariant 
factor version of a rational canonical form for A are “rational” expressions 
in the coefficients of A, hence the origin of the term rational canonical 
form. Is the same true for the elementary divisor version? 
Let r € L(V) where V is finite-dimensional. If p(x) € F'[x] is irreducible 
and if p(T) is not one-to-one, prove that p(x) divides the minimal 
polynomial of 7. 
Prove that the minimal polynomial of 7 € L(V) is the least common 
multiple of its elementary divisors. 
Let r € L(V) where V is finite-dimensional. Describe conditions on the 
minimal polynomial of 7 that are equivalent to the fact that the elementary 
divisor version of the rational canonical form of 7 is diagonal. What can 
you say about the elementary divisors? 
Verify the statement that the multiset of elementary divisors (or invariant 
factors) is a complete invariant for similarity of matrices. 
Prove that given any multiset of monic prime power polynomials 


M = {pi ohp Ehana (2), ++ Ba" ()} 


and given any vector space V of dimension equal to the sum of the degrees 

of these polynomials, there is an operator T € L(V) whose multiset of 

elementary divisors is M. 

Find all rational canonical forms (up to the order of the blocks on the 

diagonal) for a linear operator on Rê having minimal polynomial 

(a — 1)?(@ + 1)?. 

How many possible rational canonical forms (up to order of blocks) are 

there for linear operators on RÉ with minimal polynomial (x — 1)(a + 1)?? 

a) Show that if A and B are n x n matrices, at least one of which is 
invertible, then AB and BA are similar. 


184 


14. 


15. 


16. 


Advanced Linear Algebra 


b) What do the matrices 
1 0 0 1 
A= p J and B= i i 
have to do with this issue? 


c) Show that even without the assumption on invertibility the matrices 
AB and BA have the same characteristic polynomial. Hint: Write 


A=PI,,Q 


where P and Q are invertible and J,,,. is an n x n matrix that has the 
r xr identity in the upper left-hand corner and O's elsewhere. Write 
B' = QBP. Compute AB and BA and find their characteristic 
polynomials. 

Let r be a linear operator on Ft with minimal polynomial 
m,(x) = (x? + 1)(x? — 2). Find the rational canonical form for 7 if 
F=Q,F=RorF=C. 

Suppose that the minimal polynomial of r € L(V) is irreducible. What can 

you say about the dimension of V? 

Let rE L(V) where V is finite-dimensional. Suppose that p(x) is an 

irreducible factor of the minimal polynomial m(x) of T. Suppose further 

that u,v € V have the property that o(w) = o(v) = p(x). Prove that 

u = f(r)v for some polyjomial f(x) if and only if v = g(r)u for some 

polynomial g(x). 


Chapter 8 
Eigenvalues and Eigenvectors 


Unless otherwise noted, we will assume throughout this chapter that all vector 
spaces are finite-dimensional. 


Eigenvalues and Eigenvectors 


We have seen that for any r€ L(V), the minimal and characteristic 
polynomials have the same set of roots (but not generally the same multiset of 
roots). These roots are of vital importance. 


Let A = [r]g be a matrix that represents T. A scalar A € F is a root of the 
characteristic polynomial c,(x) = c4(a) = det(a#J — A) if and only if 
det(AJ — A) =0 (8.1) 


that is, if and only if the matrix AJ — A is singular. In particular, if dim(V) = n, 
then (8.1) holds if and only if there exists a nonzero vector x € F” for which 


(AI — A)a =0 
or equivalently, 
TAL = ÀT 


If [v]g = x, then this is equivalent to 


or in operator language, 
Tu =v 
This prompts the following definition. 
Definition Let V be a vector space over a field F and lett € L(V). 


1) A scalar A€ F is an eigenvalue (or characteristic value) of 7 if there 
exists a nonzero vector v € V for which 


186 Advanced Linear Algebra 


Tv = AV 


In this case, v is called an eigenvector (or characteristic vector) of T 
associated with A. 

2) A scalar A € F is an eigenvalue for a matrix A if there exists a nonzero 
column vector x for which 


Ax = àz 


In this case, x is called an eigenvector (or characteristic vector) for A 
associated with A. 

3) The set of all eigenvectors associated with a given eigenvalue A, together 
with the zero vector, forms a subspace of V, called the eigenspace of À and 
denoted by E). This applies to both linear operators and matrices. 

4) The set of all eigenvalues of an operator or matrix is called the spectrum 
of the operator or matrix. We denote the spectrum of T by Spec(r).O 


Theorem 8.1 Let T € L(V) have minimal polynomial m,(x) and characteristic 

polynomial c,(x). 

I) The spectrum of 7 is the set of all roots of m,(x) or of c,(x), not counting 
multiplicity. 

2) The eigenvalues of a matrix are invariants under similarity. 

3) The eigenspace E of the matrix A is the solution space to the homogeneous 
system of equations 


(AI — A)(x) =0 Oo 


One way to compute the eigenvalues of a linear operator 7 is to first represent T 
by a matrix A and then solve the characteristic equation 


det(xI — A) = 0 


Unfortunately, it is quite likely that this equation cannot be solved when 
dim(V) > 5. As a result, the art of approximating the eigenvalues of a matrix is 
a very important area of applied linear algebra. 


The following theorem describes the relationship between eigenspaces and 
eigenvectors of distinct eigenvalues. 


Theorem 8.2 Suppose that Aı,..., Ag are distinct eigenvalues of a linear 

operator T € L(V). 

1) Eigenvectors associated with distinct eigenvalues are linearly independent; 
that is, if vi € E), then the set {v1,... , vx} is linearly independent. 

2) The sum E, +--+: + E, is direct; that is, Ey, ®-++ ® Ey, exists. 

Proof. For part 1), if {v,,...,v;} is linearly dependent, then by renumbering if 

necessary, we may assume that among all nontrivial linear combinations of 


Eigenvalues and Eigenvectors 187 


these vectors that equal 0, the equation 
rivi trj =0 (8.2) 
has the fewest number of terms. Applying T gives 
riàt +e + rjv = 0 (8.3) 
Multiplying (8.2) by A; and subtracting from (8.3) gives 
ro(Azg — àiu +++ + r(A; — àiu = 0 


But this equation has fewer terms than (8.2) and so all of its coefficients must 
equal 0. Since the A;'s are distinct, r; = 0 for i > 2 and so rı = 0 as well. This 
contradiction implies that the v;'s are linearly independent. 


The next theorem describes the spectrum of a polynomial p(T) in 7. 

Theorem 8.3 (The spectral mapping theorem) Let V be a vector space over 

an algebraically closed field F. Let tr € L(V) and let p(x) € Fla]. Then 
Spec(p(r)) = p(Spec(7)) = {p(A) | A € Spec(r)} 


Proof. We leave it as an exercise to show that if A is an eigenvalue of 7, then 
p(A) is an eigenvalue of p(T). Hence, p(Spec(7)) C Spec(p(r)). For the reverse 
inclusion, let A € Spec(p(7)), that is, 


(p(T) = Aju = 0 
for v Æ 0. If 
a TE T 


where r; € F, then writing this as a product of (not necessarily distinct) linear 
factors, we have 


(T= ri) ren (T= rn) rao = 0 


(The operator rų is written ry for convenience.) We can remove factors from 
the left end of this equation one by one until we arrive at an operator o (perhaps 
the identity) for which ov 4 0 but (T — r,)ov = 0. Then ov is an eigenvector 
for r with eigenvalue rẹ. But since p(r,)-A=0, it follows that 
A = p(rr) E€ p(Spec(r)). Hence, Spec(p(r)) C p(Spec(r)).O 


The Trace and the Determinant 


Let F be algebraically closed and let A € M,(F) have characteristic 
polynomial 
ca(z) = 2" + enat +e tart ep 
= (a — ài) (£ — An) 


188 Advanced Linear Algebra 


where A,,..., Àn are the eigenvalues of A. Then 
calx) = det(xI — A) 
and setting x = 0 gives 
det(A) = —co = (—1)"7 Ar An 


Hence, if F is algebraically closed then, up to sign, det(A) is the constant term 
of c4(a) and the product of the eigenvalues of A, including multiplicity. 


The sum of the eigenvalues of a matrix over an algebraically closed field is also 
an interesting quantity. Like the determinant, this quantity is one of the 
coefficients of the characteristic polynomial (up to sign) and can also be 
computed directly from the entries of the matrix, without knowing the 
eigenvalues explicitly. 


Definition The trace of a matrix A E M, (F), denoted by tr(A), is the sum of 
the elements on the main diagonal of A.O 


Here are the basic propeties of the trace. Proof is left as an exercise. 


Theorem 8.4 Let A, B E€ M, (F). 

1) tr(rA) = rtr(A), forr € F. 

2) tr(A+ B) =tr(A) + tr(B). 

3) tr(AB) = tr(BA). 

4) tr(ABC) =tr(CAB) = tr(BCA). However, tr(ABC) may not equal 
t(AC B). 

5) The trace is an invariant under similarity. 

6) If F is algebraically closed, then tr(A) is the sum of the eigenvalues of A, 
including multiplicity, and so 


tr(A) = —cn-1 
where ca(x) = 2" + cnt"! +- + ar + co O 


Since the trace is invariant under similarity, we can make the following 
definition. 


Definition The trace of a linear operator T € L(V) is the trace of any matrix 
that represents T.O 


As an aside, the reader who is familar with symmetric polynomials knows that 
the coefficients of any polynomial 
p(t) = 2" +c 2" +-+ at H co 
= (x — `) (£ — An) 


Eigenvalues and Eigenvectors 189 


are the elementary symmetric functions of the roots: 


Cn-1 = =>, ri 
Ca = (-1)?> AA; 


i<j 


The most important elementary symmetric functions of the eigenvalues are the 
first and last ones: 


Cn-1 = —Ap te + An = tr(A) and c= (—1)"A1- An = det(A) 


Geometric and Algebraic Multiplicities 


Eigenvalues actually have two forms of multiplicity, as described in the next 
definition. 


Definition Let À be an eigenvalue of a linear operator T € L(V). 

1) The algebraic multiplicity of À is the multiplicity of À as a root of the 
characteristic polynomial c,(«). 

2) The geometric multiplicity of À is the dimension of the eigenspace €).0 


Theorem 8.5 The geometric multiplicity of an eigenvalue A of T E€ L(V) is less 
than or equal to its algebraic multiplicity. 

Proof. We can extend any basis Bı = {v1,...,v,} of Ex to a basis B for V. 
Since E) is invariant under 7, the matrix of 7 with respect to B has the block 


form 
[r] = ( Ali J 
0 B block 


where A and B are matrices of the appropriate sizes and so 
c (x) = det(al — [r]z) 
= det(xl;, — AIp)det(xIn-k — B) 
= (x — \)*det(xI,_, — B) 


(Here n is the dimension of V.) Hence, the algebraic multiplicity of A is at least 
equal to the the geometric multiplicity k of 7.0 


190 Advanced Linear Algebra 


The Jordan Canonical Form 


One of the virtues of the rational canonical form is that every linear operator on 
a finite-dimensional vector space has a rational canonical form. However, as 
mentioned earlier, the rational canonical form may be far from the ideal of 
simplicity that we had in mind for a set of simple canonical forms and is really 
more of a theoretical tool than a practical tool. 
When the minimal polynomial m,(x) of 7 splits over F, 

m(x) = (x — à) (x — An)” 
there is another set of canoncial forms that is arguably simpler than the set of 


rational canonical forms. 


In some sense, the complexity of the rational canonical form comes from the 
choice of basis for the cyclic submodules ((v;,;)). Recall that the 7-cyclic bases 
have the form 


dji 
Bij = (vig, TVi TO Vi) 


Cij 


where d; ; = deg(p;”). With this basis, all of the complexity comes at the end, 


so to speak, when we attempt to express 
T(r" (vig) = 7 (vig) 


as a linear combination of the basis vectors. 


However, since B; j has the form 
(v, TU; rv, aes T™ to) 
any ordered set of the form 
(po(T)v, pi(T)v,---; Pa—-1(T)v) 


where deg(p,(x)) = k will also be a basis for ((v;,;)). In particular, when m, (x) 
splits over F, the elementary divisors are 


ORE 
and so the set 
Cig = (vig, (T — Aavigs e (T — Ai) wig) 


is also a basis for ((v;,;)). 


If we temporarily denote the kth basis vector in C;; by bg, then for 
k=0,...,€¢5— 2, 


Eigenvalues and Eigenvectors 191 


Tbk = T[(7 — Aa)* (vig) 

(T= Ai + ADIT — Ai)" (via) 

= (r — A)T) + Alr — Ai)" (wi) 
= bk+1 + Aide 


For k = e;,; — 1, a similar computation, using the fact that 
(T — A) H (vig) = (T= A) (vi) = 0 
gives 
T (be; ;-1) = ibe; ;-1 


Thus, for this basis, the complexity is more or less spread out evenly, and the 
matrix of T|(,,) with respect to C;,; is the e;,; x e; j matrix 


0 eem OD 
1 a f 
J Apes) = |0. 1 : 
Eo Ma a oO 
0O > 0 1 A; 


which is called a Jordan block associated with the scalar A;. Note that a Jordan 
block has A;'s on the main diagonal, 1's on the subdiagonal and 0's elsewhere. 
Let us refer to the basis 

e= J 


as a Jordan basis for 7. 


Theorem 8.6 (The Jordan canonical form) Suppose that the minimal 
polynomial of T € L(V) splits over the base field F, that is, 


m(x) = (x — A1)%++ (£ — An)” 


where ; € F. 
1) The matrix of T with respect to a Jordan basis C is 


diag(.7(A1, e11), d I (à, ĉl,ki); EAA T (Ans eni) trey Tn; Cn,ky)) 


where the polynomials (x — d;)%i are the elementary divisors of T. This 
block diagonal matrix is said to be in Jordan canonical form and is called 
the Jordan canonical form of 7. 

2) If F is algebraically closed, then up to order of the block diagonal 
matrices, the set of matrices in Jordan canonical form constitutes a set of 
canonical forms for similarity. 

Proof. For part 2), the companion matrix and corresponding Jordan block are 

similar: 


192 Advanced Linear Algebra 


C(x — Ai] ~ I (Ai, eij) 


since they both represent the same operator 7 on the subspace ((v,,;)). It follows 
that the rational canonical matrix and the Jordan canonical matrix for 7 are 
similar. 


Note that the diagonal elements of the Jordan canonical form J of 7 are 
precisely the eigenvalues of 7, each appearing a number of times equal to its 
algebraic multiplicity. In general, the rational canonical form does not “expose” 
the eigenvalues of the matrix, even when these eigenvalues lie in the base field. 


Triangularizability and Schur's Lemma 


We have discussed two different canonical forms for similarity: the rational 
canonical form, which applies in all cases and the Jordan canonical form, which 
applies only when the base field is algebraically closed. Moreover, there is an 
annoying sense in which these sets of canoncial forms leave something to be 
desired: One is too complex and the other does not always exist. 


Let us now drop the rather strict requirements of canonical forms and look at 
two classes of matrices that are too large to be canonical forms (the upper 
triangular matrices and the almost upper triangular matrices) and one class of 
matrices that is too small to be a canonical form (the diagonal matrices). 


The upper triangular matrices (or lower triangular matrices) have some nice 
algebraic properties and it is of interest to know when an arbitrary matrix is 
similar to a triangular matrix. We confine our attention to upper triangular 
matrices, since there are direct analogs for lower triangular matrices as well. 


Definition A linear operator T € L(V) is upper triangularizable if there is an 
ordered basis B= (vi,..., un) of V for which the matrix [r]g is upper 
triangular, or equivalently, if 


TU; E (U1,..., Vi) 


foralli =1,...,n.0 


As we will see next, when the base field is algebraically closed, all operators are 
upper triangularizable. However, since two distinct upper triangular matrices 
can be similar, the class of upper triangular matrices is not a canonical form for 
similarity. Simply put, there are just too many upper triangular matrices. 


Theorem 8.7 (Schur's theorem) Let V be a finite-dimensional vector space 

over a field F. 

I) If the characteristic polynomial (or minimal polynomial) of T € L(V ) splits 
over F, then T is upper triangularizable. 

2) IfF is algebraically closed, then all operators are upper triangularizable. 


Eigenvalues and Eigenvectors 193 


Proof. Part 2) follows from part 1). The proof of part 1) is most easily 
accomplished by matrix means, namely, we prove that every square matrix 
A € M,(F) whose characteristic polynomial splits over F is similar to an upper 
triangular matrix. If n = 1 there is nothing to prove, since all 1 x 1 matrices are 
upper triangular. Assume the result is true for n — 1 and let A € M,,(F’). 


Let vı be an eigenvector associated with the eigenvalue A, € F of A and extend 
{vı} to an ordered basis B = (v1, ... , Un) for R”. The matrix of 74 with respect 
to B has the form 


for some A; € Mp-1(F). Since [T4]g and A are similar, we have 
det («I — A) = det (xI — [Ta]p) = (a — Aj) det (aI — Aj) 


Hence, the characteristic polynomial of A, also splits over F and the induction 
hypothesis implies that there is an invertible matrix P € M,_1(F’) for which 


U = PAP 


is upper triangular. Hence, if 


then Q is invertible and 
= 1 0 Ài x J 0 2 Ài * 
QlAlsg =|) ale Ay |\o.P4|~— lo vU 
is upper triangular. O 


The Real Case 


When the base field is F = R, an operator 7 is upper triangularizable if and 
only if its characteristic polynomial splits over R. (Why?) We can, however, 
always achieve a form that is close to triangular by permitting values on the first 
subdiagonal. 


Before proceeding, let us recall Theorem 7.11, which says that for a module W, 
of prime order m, (x), the following are equivalent: 


1) W, is cyclic 

2) W, is indecomposable 

3) c,(x) is irreducible 

4) 7 is nonderogatory, that is, c,(~) = m, (x) 
5) dim(W,) = deg(p(a)). 


194 Advanced Linear Algebra 


Now suppose that F = R and c,(x) = x? + sx + t is an irreducible quadratic. 
If B is a T-cyclic basis for W,, then 


[7]s = Ei d 


However, there is a more appealing matrix representation of 7. To this end, let 
A be the matrix above. As a complex matrix, A has two distinct eigenvalues: 


s / 4t — s? 


A=- +i 


Now, a matrix of the form 


has characteristic polynomial q(x) = (x — a)? + b? and eigenvalues a + ib. So 
if we set 


y 4t — 82 


S 
=-= and b=- 
a 5) an 5) 


then B has the same two distinct eigenvalues as A and so A and B have the 
same Jordan canonical form over C. It follows that A and B are similar over C 
and therefore also over R, by Theorem 7.20. Thus, there is an ordered basis C 
for which [r]e = B. 


Theorem 8.8 /f F = R and W, is cyclic and deg(c-(x)) = 2, then there is an 
ordered basis C for which 


[Tle = É j O 


a 


Now we can proceed with the real version of Schur's theorem. For the sake of 
the exposition, we make the following definition. 


Definition 4 matrix A € M,,(F) is almost upper triangular if it has the form 


Aj x 
Ag 
0 Ak block 


where 


Eigenvalues and Eigenvectors 195 


a —b 
Ai= [|a] or A; = É A 
for a,b € F. A linear operator T € L(V) is almost upper triangularizable if 
there is an ordered basis B for which |r]g is almost upper triangular. O 


To see that every real linear operator is almost upper triangularizable, we use 
Theorem 7.19, which states that if p(x) is a prime factor of c,(a), then V, has a 
cyclic submodule W, of order p(x). Hence, W is a r-cyclic subspace of 
dimension deg(p(x)) and T|y has characteristic polynomial p(x). 


Now, the minimal polynomial of a real operator r € L(V) factors into a product 
of linear and irreducible quadratic factors. If c,(x) has a linear factor over F, 
then V, has a one-dimensional r-invariant subspace W. If c,(a) has an 
irreducible quadratic factor p(x), then V, has a cyclic submodule W, of order 
p(x) and so a matrix representation of r on W, is given by the matrix 


a —b 
a-[F 2 
This is the basis for an inductive proof, as in the complex case. 


Theorem 8.9 (Schur's theorem: real case) /f V is a real vector space, then 
every linear operator on V is almost upper triangularizable. 

Proof. As with the complex case, it is simpler to proceed using matrices, by 
showing that any n x n real matrix A is similar to an almost upper triangular 
matrix. The result is clear if n = 1. Assume for the purposes of induction that 
any square matrix of size less than n x n is almost upper triangularizable. 


We have just seen that F” has a one-dimensional 74-invariant subspace W or a 
two-dimensional 74-cyclic subspace W, where 7, has irreducible characteristic 
polynomial on W. Hence, we may choose a basis $ for F” for which the first 
one or first two vectors are a basis for W. Then 


rale=(C A] 
“ee [0 Ay block 
where 


Ap= lef a asi | 


and A has size k x k. The induction hypothesis applied to A» gives an 
invertible matrix P € M;, for which 


U = PAP! 


196 Advanced Linear Algebra 


is almost upper triangular. Hence, if 


then Q is invertible and 
= oa Ink 0 A * Ink 0 L A * 
[4]6Q =| P||0 Aj, o Pt] |0 U 
is almost upper triangular. O 


Unitary Triangularizability 


Although we have not yet discussed inner product spaces and orthonormal 
bases, the reader may very well be familiar with these concepts. For those who 
are, we mention that when V is a real or complex inner product space, then if an 
operator 7 on V can be triangularized (or almost triangularized) using an 
ordered basis B, it can also be triangularized (or almost triangularized) using an 
orthonormal ordered basis O. 


To see this, suppose we apply the Gram-Schmidt orthogonalization process to a 
basis B = (v,...,U,) that triangularizes (or almost triangularizes) 7. The 
resulting ordered orthonormal basis O = (u1,..., Un) has the property that 


(Uy, +++, Uj) = (U1, ---, Ui) 
for all i < n. Since [r]g is (almost) upper triangular, that is, 
TU; E (U1,.-., Vi) 
for all i < n, it follows that 
TU; E (TU, ..., TVi) C (U1,.-., Uj) = (U1, .-- , Ui) 
and so the matrix [T]o is also (almost) upper triangular. 
A linear operator 7 is unitarily upper triangularizable if there is an ordered 
orthonormal basis with respect to which 7 is upper triangular. Accordingly, 
when V is an inner product space, we can replace the term “upper 


triangularizable” with “unitarily upper triangularizable” in Schur's theorem. (A 
similar statement holds for almost upper triangular matrices.) 


Diagonalizable Operators 


Definition A linear operator T € L(V) is diagonalizable if there is an ordered 
basis B=(v1,...,Un) of V for which the matrix [r]g is diagonal, or 
equivalently, if 


Eigenvalues and Eigenvectors 197 


TU = AjV; 


foralli =1,...,n.0 


The previous definition leads immediately to the following simple 
characterization of diagonalizable operators. 


Theorem 8.10 Let r € L(V). The following are equivalent: 
1) 7 is diagonalizable. 


2) V has a basis consisting entirely of eigenvectors of T. 
3) V has the form 


V=6,0::@O&, 


where Xi, ... , Ap are the distinct eigenvalues of T.O 


Diagonalizable operators can also be characterized in a simple way via their 
minimal polynomials. 


Theorem 8.11 A linear operator T € L(V) on a finite-dimensional vector space 
is diagonalizable if and only if its minimal polynomial is the product of distinct 
linear factors. 

Proof. If 7 is diagonalizable, then 


V =E D DEn 
and Theorem 7.7 implies that m,(x) is the least common multiple of the 
minimal polynomials x — A; of 7 restricted to €;. Hence, m-(x) is a product of 


distinct linear factors. Conversely, if m-(x) is a product of distinct linear 
factors, then the primary decomposition of V has the form 


V=V 9 V; 
where 
V= {vE V | (T-A = 0} = E; 
and so 7 is diagonalizable.O 
Spectral Resolutions 


We have seen (Theorem 2.25) that resolutions of the identity on a vector space 
V correspond to direct sum decompositions of V. We can do something similar 
for any diagonalizable linear operator 7 on V (not just the identity operator). 
Suppose that 7 has the form 


T= Mp1 tees + Appr 


where pı +-+ pr =. is a resolution of the identity and the A; € F are 
distinct. This is referred to as a spectral resolution of 7. 


198 Advanced Linear Algebra 


We claim that the A;'s are the eigenvalues of 7 and im(p;) = €),. Theorem 2.25 
implies that 


V =im(p1) ® --- ® im(px) 
If piv € im(p;), then 
T(piv) = (Arp + +++ + An pe) piv = Ai (piv) 
and so p;v € Ey,. Hence, im(p;) C E), and so 
V =im(p1) @--: Sim(px) CE BE, CV 
which implies that im(p;) = €), and 
V=6,0:°:@&, 
The converse also holds, for if V = Ey, ®--- ® Ex, and if p; is projection onto 
E, along the direct sum of the other eigenspaces, then 
Plt peat 
and since Tp; = A; /;, it follows that 
T = T(p1 t+: + pe) = Api +e ARP 
Theorem 8.12 A linear operator T € L(V) is diagonalizable if and only if it 
has a spectral resolution 
T = Api t+: +AnpE 
In this case, {A,,..., x} is the spectrum of T and 
im(p;) =E, and ker(p;) = Dé, O 
j#i 
Exercises 


1. Let J be the n x n matrix all of whose entries are equal to 1. Find the 
minimal polynomial and characteristic polynomial of J and the 
eigenvalues. 

2. Prove that the eigenvalues of a matrix do not form a complete set of 

invariants under similarity. 

Show that r € L(V) is invertible if and only if 0 is not an eigenvalue of 7. 

4. Let A be an n xn matrix over a field F that contains all roots of the 
characteristic polynomial of A. Prove that det(A) is the product of the 
eigenvalues of A, counting multiplicity. 

5. Show that if À is an eigenvalue of 7, then p() is an eigenvalue of p(T), for 
any polynomial p(x). Also, if A 4 0, then \~+ is an eigenvalue for r~t. 

6. An operator T E€ L(V) is nilpotent if 7” = 0 for some positive n € N. 


ies) 


10. 


11. 


12. 


13. 


Eigenvalues and Eigenvectors 199 


a) Show that if 7 is nilpotent, then the spectrum of r is {0}. 

b) Find a nonnilpotent operator r with spectrum {0}. 

Show that if o,r € L(V) and one of o and 7 is invertible, then oT ~ To 
and so o7 and Ta have the same eigenvalues, counting multiplicty. 
(Halmos) 

a) Find a linear operator r that is not idempotent but for which 


r(e- r)=0. 
b) Find a linear operator r that is not idempotent but for which 
T(t- T} =0. 


c) Prove that if 7?(1 — T) = T(t — T)? = 0, then 7 is idempotent. 

An involution is a linear operator 0 for which 0? = v. If 7 is idempotent 
what can you say about 27 — 1? Construct a one-to-one correspondence 
between the set of idempotents on V and the set of involutions. 

Let A,B €Mbo(C) and suppose that A? = B? = I, ABA = B! but 
A 4 I and BF I. Show that if C € Mj(C) commutes with both A and B, 
then C = rI for some scalar r € C. 

Let r € L(V) and let 


S = (v,Tv,..., 7710) 


be a 7-cyclic submodule of V, with minimal polynomial p(a)° where p(x) 
is prime of degree d. Let o = p(T) restricted to (v). Show that S' is the 
direct sum of d o-cyclic submodules each of dimension e, that is, 


S=T,®--@Ty 
Hint: For each 0 < i < d, consider the set 
B; = {r'v, p(r)r'v, ..., p(T) triv) 


Fix € > 0. Show that any complex matrix is similar to a matrix that looks 
just like a Jordan matrix except that the entries that are equal to 1 are 
replaced by entries with value €, where € is any complex number. Thus, any 
complex matrix is similar to a matrix that is “almost” diagonal. Hint: 
consider the fact that 


1 0 0 A 0 0 1 0 0 A 0 0 
0 e 0 1 A 0 0 e O}]=]e AO 
00 ejo 1 AlIO 0 e 0e A 


Show that the Jordan canonical form is not very robust in the sense that a 
small change in the entries of a matrix A may result in a large jump in the 
entries of the Jordan form J. Hint: consider the matrix 


c 0 
sel] 


What happens to the Jordan form of A, as € > 0? 


200 Advanced Linear Algebra 


14. Give an example of a complex nonreal matrix all of whose eigenvalues are 
real. Show that any such matrix is similar to a real matrix. What about the 
type of the invertible matrices that are used to bring the matrix to Jordan 
form? 

15. Let J = [r]g be the Jordan form of a linear operator r € L(V). For a given 
Jordan block of J(A,e) let U be the subspace of V spanned by the basis 
vectors of B associated with that block. 

a) Show that r|y has a single eigenvalue A with geometric multiplicity 1. 
In other words, there is essentially only one eigenvector (up to scalar 
multiple) associated with each Jordan block. Hence, the geometric 
multiplicity of A for 7 is the number of Jordan blocks for A. Show that 
the algebraic multiplicity is the sum of the dimensions of the Jordan 
blocks associated with A. 

b) Show that the number of Jordan blocks in J is the maximum number 
of linearly independent eigenvectors of T. 

c) What can you say about the Jordan blocks if the algebraic multiplicity 
of every eigenvalue is equal to its geometric multiplicity? 

16. Assume that the base field F’ is algebraically closed. Then assuming that the 
eigenvalues of a matrix A are known, it is possible to determine the Jordan 
form J of A by looking at the rank of various matrix powers. A matrix B is 
nilpotent if B” = 0 for some n > 0. The smallest such exponent is called 
the index of nilpotence. 

a) Let J = J(A,n) be a single Jordan block of size n x n. Show that 
J — AI is nilpotent of index n. Thus, n is the smallest integer for 
which rk(J — AI)" = 0. 

Now let J be a matrix in Jordan form but possessing only one eigenvalue 

À. 

b) Show that J — XJ is nilpotent. Let m be its index of nilpotence. Show 
that m is the maximum size of the Jordan blocks of J and that 
rk(J — AT)™!} is the number of Jordan blocks in J of maximum size. 

c) Show that rk(J — AJ)'™~? is equal to 2 times the number of Jordan 
blocks of maximum size plus the number of Jordan blocks of size one 
less than the maximum. 

d) Show that the sequence rk(.J —AI)* for k=1,...,m uniquely 
determines the number and size of all of the Jordan blocks in J, that is, 
it uniquely determines J up to the order of the blocks. 

e) Now let J be an arbitrary Jordan matrix. If \ is an eigenvalue for J 
show that the sequence rk( J — AT)! for k = 1,...,m where m is the 
first integer for which rk(J — AI)" =rk(J — AI)™*! uniquely 
determines J up to the order of the blocks. 

f) Prove that for any matrix A with spectrum {A1,..., As} the sequence 
rk(A — d,1)* for i=1,...,5 and k =1,...,m where m is the first 
integer for which rk(A—A,J)™=rk(A—,J)™*' uniquely 
determines the Jordan matrix J for A up to the order of the blocks. 

17. Let A E M,(F). 


Eigenvalues and Eigenvectors 201 


a) Ifall the roots of the characteristic polynomial of A lie in F prove that 
A is similar to its transpose A‘. Hint: Let B be the matrix 


G e 0 1 
< = 1 0 
B= O : 
1 0 0 


with 1's on the diagonal that moves up from left to right and O's 
elsewhere. Let J be a Jordan block of the same size as B. Show that 
BIB = Jt. 

b) Let A, B E€ M,,(F). Let K be a field containing F. Show that if A and 
B are similar over K, that is, if B = PAP~' where P € M, (K), then 
A and B are also similar over F', that is, there exists Q € M, (F) for 
which B = QAQ™!. 

c) Show that any matrix is similar to its transpose. 


The Trace of a Matrix 


18. 


19. 


20. 


Let A E€ M, (F). Verify the following statements. 

a) tr(rA) = rtr(A), forr € F. 

b) tr(A+ B) = tr(A) + tr(B). 

c) tr(AB) =t(BA). 

d) tr(ABC) =tr(CAB) =tr(BCA). Find an example to show that 
tr(ABC) may not equal tr(AC B). 

e) The trace is an invariant under similarity. 

f) If F is algebraically closed, then the trace of A is the sum of the 
eigenvalues of A. 

Use the concept of the trace of a matrix, as defined in the previous exercise, 

to prove that there are no matrices A, B € M,,(C) for which 


AB-BA=I 


Let T: M,,(F’) — F be a function with the following properties. For all 
matrices A, B € M,,(F) andr € F, 

1) T(rA) =rT(A) 

2) T(A+ B)=T(A)+T7(B) 

3) T(AB) =T(BA) 

Show that there exists se F for which T(A)=str(A), for all 
AEM, (F). 


Commuting Operators 


Let 


F={7,€ L(V) lie T} 


be a family of operators on a vector space V. Then F is a commuting family if 
every pair of operators commutes, that is, or = To for all o,r € F. A subspace 


202 Advanced Linear Algebra 


U of V is F-invariant if it is T-invariant for every T € F. It is often of interest 
to know whether a family F of linear operators on V has a common 
eigenvector, that is, a single vector v € V that is an eigenvector for every 
o E€ F (the corresponding eigenvalues may be different for each operator, 
however). 


21. A pair of linear operators o,r € L(V) is simultaneously diagonalizable if 
there is an ordered basis 6 for V for which [7]g and [o]g are both diagonal, 
that is, 6 is an ordered basis of eigenvectors for both 7 and ø. Prove that 
two diagonalizable operators o and 7 are simultaneously diagonalizable if 
and only if they commute, that is, or = ro. Hint: If or = To, then the 
eigenspaces of 7 are invariant under o. 

22. Let o,r € L(V). Prove that if o and r commute, then every eigenspace of 
o is T-invariant. Thus, if F is a commuting family, then every eigenspace 
of any member of F is F-invariant. 

23. Let F be a family of operators in L(V) with the property that each operator 
in F has a full set of eigenvalues in the base field F, that is, the 
characteristic polynomial splits over F. Prove that if F is a commuting 
family, then F has a common eigenvector v € V. 

24. What do the real matrices 


t. i 1.2 
a= i and B=|_) i 


have to do with the issue of common eigenvectors? 


Gersgorin Disks 


It is generally impossible to determine precisely the eigenvalues of a given 
complex operator or matrix A E€ M,C), for if n > 5, then the characteristic 
equation has degree 5 and cannot in general be solved. As a result, the 
approximation of eigenvalues is big business. Here we consider one aspect of 
this approximation problem, which also has some interesting theoretical 
consequences. 


Let A € M,,(C) and suppose that Av = Av where v = (b1,...,,)'. Comparing 
kth rows gives 


ye = Abk 
=I 


which can also be written in the form 
by(A — Ark) = Sau! i 


iZ 
If k has the property that |b| > |b;| for all i, we have 


Eigenvalues and Eigenvectors 203 


[bx ||A — Arz| < S > Axil lbs < [brl >| Anal 
zk iZk 
and thus 


|A — Ark| < S Anil (8.7) 

z 
The right-hand side is the sum of the absolute values of all entries in the kth row 
of A except the diagonal entry Agp. This sum R;,(A) is the kth deleted absolute 
row sum of A. The inequality (8.7) says that, in the complex plane, the 


eigenvalue A lies in the disk centered at the diagonal entry A;; with radius equal 
to R;(A). This disk 


GR (A) = {z2 E C | |z — Ags] < Ry (A)} 


is called the Geršgorin row disk for the kth row of A. The union of all of the 
Geršgorin row disks is called the Geršgorin row region for A. 


Since there is no way to know in general which is the index k for which 
|b| > |b;|, the best we can say in general is that the eigenvalues of A lie in the 
union of all GerSgorin row disks, that is, in the GerSgorin row region of A. 


Similar definitions can be made for columns and since a matrix has the same 
eigenvalues as its transpose, we can say that the eigenvalues of A lie in the 
GerSgorin column region of A. The GerSgorin region G(A) of a matrix 
A € M,(F) is the intersection of the GerSgorin row region and the GerSgorin 
column region and we can say that all eigenvalues of A lie in the GerSgorin 
region of A. In symbols, cA C GA. 


25. Find and sketch the GerSgorin region and the eigenvalues for the matrix 


1 2 3 
A= |4 5 6 
7 8 9 


26. A matrix A € M,,(C) is diagonally dominant if for each k = 1,...,n, 
[Ars] 2 Re(A) 


and it is strictly diagonally dominant if strict inequality holds. Prove that 
if A is strictly diagonally dominant, then it is invertible. 

27. Find a matrix A € M,,(C) that is diagonally dominant but not invertible. 

28. Find a matrix A € M,,(C) that is invertible but not strictly diagonally 
dominant. 


Chapter 9 
Real and Complex Inner Product Spaces 


We now turn to a discussion of real and complex vector spaces that have an 
additional function defined on them, called an inner product, as described in the 
following definition. In this chapter, F will denote either the real or complex 
field. Also, the complex conjugate of r € C is denoted by 7. 


Definition Let V be a vector space over F = R or F =C. An inner product 
on V is a function (,):V x V — F with the following properties: 
1) (Positive definiteness) For all v € V, 


(v,v) >0 and (wv, v) =0Sv=0 
2) For F = C: (Conjugate symmetry) 
(u,v) = (vu) 
For F = R: (Symmetry) 
(u,v) = (v, u) 
3) (Linearity in the first coordinate) For all u,v € V and r,s € F 
(ru + sv, w) = rlu, w) + slv, w) 


A real (or complex) vector space V, together with an inner product, is called a 
real (or complex) inner product space. O 


If X,Y C V, then we let 
(X,Y) ={(2,y) |e e X,ye VY} 
and 
(v,X) = {(v,2) |2 EX} 


Note that a vector subspace S of an inner product space V is also an inner 
product space under the restriction of the inner product of V to S. 


206 Advanced Linear Algebra 


We will study bilinear forms (also called inner products) on vector spaces over 
fields other than R or C in Chapter 11. Note that property 1) implies that (v, v) 
is always real, even if V is a complex vector space. 


If F = R, then properties 2) and 3) imply that the inner product is linear in both 
coordinates, that is, the inner product is bilinear. However, if F = C, then 


(w,ru + sv) = (rut sv, w) = Fu, w) + S(v, w) = Fw, u) + slw, v) 


This is referred to as conjugate linearity in the second coordinate. Specifically, 
a function f: V — W between complex vector spaces is conjugate linear if 


f(u+v) = flu) + fr) 
and 
fru) =Tf(u) 


for all u,v € V and r € C. Thus, a complex inner product is linear in its first 
coordinate and conjugate linear in its second coordinate. This is often described 
by saying that a complex inner product is sesquilinear. (Sesqui means “one and 
a half times.”) 


Example 9.1 
1) The vector space R” is an inner product space under the standard inner 
product, or dot product, defined by 


((ri, feng Fa) (sı, e. Sn)) = T181 Fte H TnSn 


The inner product space R” is often called n-dimensional Euclidean 
space. 

2) The vector space C” is an inner product space under the standard inner 
product defined by 


((T1; -03 Tn); (S1; -<< 3 Sn)) = r151 Fe + TnSn 


This inner product space is often called n-dimensional unitary space. 

3) The vector space C [a,b] of all continuous complex-valued functions on the 
closed interval [a,b] is a complex inner product space under the inner 
product 


b 
(f.9) = | f(x)g(a) de o 


Example 9.2 One of the most important inner product spaces is the vector space 
of all real (or complex) sequences (sn) with the property that 


> el < œ 


Real and Complex Inner Product Spaces 207 


under the inner product 
(Sn); (tn)) = X snin 
n=0 


Such sequences are called square summable. Of course, for this inner product 
to make sense, the sum on the right must converge. To see this, note that if 
(Sn), (tn) € Æ, then 


0 < ([sn] = ltal)? = lsn? — 2lsnllénl + ltal? 
and so 
2|Sntn| < snl” T ltal” 


which implies that (sntn) € Æ. We leave it to the reader to verify that & is an 
inner product space.O 


The following simple result is quite useful. 


Lemma 9.1 Jf V is an inner product space and (u,x) = (v, x) for all x € V, 
then u = v.0 


The next result points out one of the main differences between real and complex 
inner product spaces and will play a key role in later work. 


Theorem 9.2 Let V be an inner product space and let T € L(V). 
!) 


(Tv, w) =O0forallu,weEV => r=0 


2) IfV is a complex inner product space, then 
(Tu, v) =OforalluEeV => 7r=0 
but this does not hold in general for real inner product spaces. 


Proof. Part 1) follows directly from Lemma 9.1. As for part 2), let v = rx + y, 
for x,y € V andr € F. Then 


0 = (r(ra +y), re +y) 
Ir? (rz, x) + (Ty, y) + r(x, y) +F(Ty, 2) 
= r(T£, Y) +T(Ty, £) 


Setting r = 1 gives 
(rx, y) + (Ty, £) =0 


and setting r = i gives 


208 Advanced Linear Algebra 


(rzy) — (ry,2) = 0 


These two equations imply that (ra,y) =0 for all x,y € V and so part 1) 
implies that 7r = 0. For the last statement, rotation by 90 degrees in the real 
plane R? has the property that (rv, v} = 0 for all v.O 


Norm and Distance 
If V is an inner product space, the norm, or length of v € V is defined by 
loll = v w, v) (9.1) 


A vector v is a unit vector if ||v|| = 1. Here are the basic properties of the norm. 


Theorem 9.3 
D |lo|| > 0 and ||v|| = 0 if and only if v = 0. 
2) Forallr € F andvE V, 


lirull = Irlllvll 
3) (The Cauchy-Schwarz inequality) For all u,v € V, 
[u,v] < [lel lel 


with equality if and only if one of u and v is a scalar multiple of the other. 
4) (The triangle inequality) For all u,v E V, 


lu + oll < llull + [loll 


with equality if and only if one of u and v is a scalar multiple of the other. 
5) Forallu,v,xz E V, 


lu — vl] < llu = al] + |z — vll 
6) Forallu,v EV, 
[Mell — lloll] < Iu — vll 
7) (The parallelogram law) For all u,v € V, 
2 
lu + oll? + llu — vl? = 2llull? + Ifo)? 


Proof. We prove only Cauchy—Schwarz and the triangle inequality. For 
Cauchy—Schwarz, if either u or v is zero the result follows, so assume that 
u,v Æ 0. Then, for any scalar r € F, 


0 < |lu— roll? 
= (u—rv,u—rv) 
= (u, u) ~~ r(u, v) ~~ r[(v, u) ~ rv, v)| 


Choosing 7 = (v,u)/(v,v) makes the value in the square brackets equal to 0 


Real and Complex Inner Product Spaces 209 


and so 


(v, u) (u, wija 2 — (u,v)? 
(v, v) llul? 


which is equivalent to the Cauchy—Schwarz inequality. Furthermore, equality 
holds if and only if ||u — rv||? = 0, that is, if and only if u — rv = 0, which is 
equivalent to u and v being scalar multiples of one another. 


To prove the triangle inequality, the Cauchy—Schwarz inequality gives 
u+ ol]? = u +v, u +v) 
= (u,u) + (u,v) + (v, u) + w, v) 
< llul? + 2llullllol] + lloll? 
= (lell + lel)? 


from which the triangle inequality follows. The proof of the statement 
concerning equality is left to the reader. O 


Any vector space V, together with a function ||- ||:V — R that satisfies 
properties 1), 2) and 4) of Theorem 9.3, is called a normed linear space and the 
function || - || is called a norm. Thus, any inner product space is a normed linear 
space, under the norm given by (9.1). 


It is interesting to observe that the inner product on V can be recovered from the 
norm. Thus, knowing the length of all vectors in V is equivalent to knowing all 


inner products of vectors in V. 


Theorem 9.4 (The polarization identities) 
1) IfV isa real inner product space, then 


1 2 
(u,v) = z (lu + vll = [lu — vll’) 
2) IfV is a complex inner product space, then 
1 1. 2 : 
(u,v) = 3 (llu + oll? — lju — vl?) + qilllu + tllt — llu — ivl’) 


The norm can be used to define the distance between any two vectors in an 
inner product space. 


Definition Let V be an inner product space. The distance d(u, v) between any 
two vectors u and v in V is 


d(u, v) = ||u — v|] (9.2)0 


Here are the basic properties of distance. 


210 Advanced Linear Algebra 


Theorem 9.5 
I) d(u,v) > Oand d(u,v) = 0 ifand only ifu = v 
2) (Symmetry) 


d(u,v) = d(v, u) 
3) (The triangle inequality) 
d(u,v) < d(u, w) + d(w, v) O 


Any nonempty set V, together with a function d: V x V — R that satisfies the 
properties of Theorem 9.5, is called a metric space and the function d is called 
a metric on V. Thus, any inner product space is a metric space under the metric 
(9.2). 


Before continuing, we should make a few remarks about our goals in this and 
the next chapter. The presence of an inner product, and hence a metric, permits 
the definition of a topology on V, and in particular, convergence of infinite 
sequences. A sequence (vn) of vectors in V converges to v € V if 


lim ||vn — v|| = 0 
n—-0o 


Some of the more important concepts related to convergence are closedness and 
closures, completeness and the continuity of linear operators and linear 
functionals. 


In the finite-dimensional case, the situation is very straightforward: All 
subspaces are closed, all inner product spaces are complete and all linear 
operators and functionals are continuous. However, in the infinite-dimensional 
case, things are not as simple. 


Our goals in this chapter and the next are to describe some of the basic 
properties of inner product spaces—both finite and infinite-dimensional—and 
then discuss certain special types of operators (normal, unitary and self-adjoint) 
in the finite-dimensional case only. To achieve the latter goal as rapidly as 
possible, we will postpone a discussion of convergence-related properties until 
Chapter 12. This means that we must state some results only for the finite- 
dimensional case in this chapter. 


Isometries 


An isomorphism of vector spaces preserves the vector space operations. The 
corresponding concept for inner product spaces is the isometry. 


Definition Let V and W be inner product spaces and let r € L(V, W). 


Real and Complex Inner Product Spaces 211 


1) 7 is an isometry if it preserves the inner product, that is, if 
(Tu, Tv) = (u,v) 


forallu,v € V. 

2) A bijective isometry is called an isometric isomorphism. When T: V — W 
is an isometric isomorphism, we say that V and W are isometrically 
isomorphic. O 


It is clear that an isometry is injective and so it is an isometric isomorphism 
provided it is surjective. Moreover, if 


dim(V) = dim(W) < co 


injectivity implies surjectivity and 7 is an isometry if and only if 7 is an 
isometric isomorphism. On the other hand, the following simple example shows 
that this is not the case for infinite-dimensional inner product spaces. 


Example 9.3 The map 7: (? — £ defined by 
T (1,22, 23,---) = (0, £1, @2,.-.) 


is an isometry, but it is clearly not surjective. 


Since the norm determines the inner product, the following should not come as a 
surprise. 


Theorem 9.6 A linear transformation T € L(V,W) is an isometry if and only if 
it preserves the norm, that is, if and only if 


Irol] = llel 


forallv E V. 
Proof. Clearly, an isometry preserves the norm. The converse follows from the 
polarization identities. In the real case, we have 


1 
(ru, Tv) = Z (liru + rol? — liru — roll?) 

1 2 

= zll + oll? = lirt — o)l’) 
1 

= z(u + wll? — Ilu — vll’) 

= (u, v) 

and so 7 is an isometry. The complex case is similar. O 


Orthogonality 


The presence of an inner product allows us to define the concept of 
orthogonality. 


212 Advanced Linear Algebra 


Definition Let V be an inner product space. 
1) Two vectors u,v € V are orthogonal, written u L v, if 


(u,v) =0 


2) Two subsets X,Y C V are orthogonal, written X LY, if (X,Y) = {0}, 
that is, if x Ly for all x € X and y € Y. We write v L X in place of 
{uv} LX. 

3) The orthogonal complement of a subset X C V is the set 


Xt={veV|v1 X} E 
The following result is easily proved. 
Theorem 9.7 Let V be an inner product space. 


1) The orthogonal complement X+ of any subset X C V is a subspace of V. 
2) For any subspace S of V, 


S N S+ = {0} o 
Definition An inner product space V is the orthogonal direct sum of 
subspaces § and T if 
V=S58T, SLT 
In this case, we write 
SOT 


More generally, V is the orthogonal direct sum of the subspaces S,..., Sn, 
written 


S=. Osn 


V=619 Sn and SiL S;fori Aj o 


Theorem 9.8 Let V be an inner product space. The following are equivalent. 

1) V=S8S0T 

2) V=S@TandT=S+ 

Proof. If V = S @T, then by definition, T C S+. However, if v € S+, then 
v = s + t where s € S and t € T. Then s is orthogonal to both t and v and so s 
is orthogonal to itself, which implies that s = 0 and so v € T. Hence, T = S+. 
The converse is clear. O 


Orthogonal and Orthonormal Sets 


Definition 4 nonempty set O = {u; | i € K} of vectors in an inner product 
space is said to be an orthogonal set if u; L uj for all i j€ K. If, in 
addition, each vector u; is a unit vector, then O is an orthonormal set. Thus, a 


Real and Complex Inner Product Spaces 213 


set is orthonormal if 
(uj, Uj) = ôi j 
for alli, j € K, where ô; j is the Kronecker delta function. O 


Of course, given any nonzero vector v € V, we may obtain a unit vector u by 
multiplying v by the reciprocal of its norm: 


1 


u=—— Uv 
llull 


This process is referred to as normalizing the vector v. Thus, it is a simple 
matter to construct an orthonormal set from an orthogonal set of nonzero 
vectors. 


Note that if u L v, then 
2 2 
lu + ol]? = hul? + loll 
and the converse holds if F = R. 


Orthogonality is stronger than linear independence. 


Theorem 9.9 Any orthogonal set of nonzero vectors in V is linearly 
independent. 
Proof. If O = {u; | i € K} is an orthogonal set of nonzero vectors and 


TiU +--+ + TnUn = 0 
then 
0 = (ryuy +- + TnUn, Uk) = Tk (Uk, Uk) 
and so rg = 0, for all k. Hence, O is linearly independent. O 
Gram-Schmidt Orthogonalization 


The Gram-Schmidt process can be used to transform a sequence of vectors into 
an orthogonal sequence. We begin with the following. 


Theorem 9.10 (Gram-Schmidt augmentation) Let V be an inner product 
space and let O = {u,...,Un} be an orthogonal set of vectors in V. If 
v É (u1, ..., Un), then there is a nonzero u E€ V for which {u;,..., un, u} is 
orthogonal and 


(Upee Ungt) a (U1 52005 Un, V) 


In particular, 


214 Advanced Linear Algebra 


where 


Proof. We simply set 


U=U-TUY 18+ — TpUn 
and force u | u; for all 7, that is, 
0 = (u, uw) = (V — riU — +++ — TnUn, Ui) = (V, Ui) — 7; (Us, Ui) 
Thus, if u; = 0, take r; = 0 and if u; 4 0, take 


ri = (v, w) O 
l (ui, ui) 


The Gram-Schmidt augmentation is traditionally applied to a sequence of 
linearly independent vectors, but it also applies to any sequence of vectors. 


Theorem 9.11 (The Gram-Schmidt orthogonalization process) Let 
B = (v1, v2, ... ) be a sequence of vectors in an inner product space V. Define a 
sequence O = (u1, u2, ... ) by repeated Gram-Schmidt augmentation, that is, 


k-1 
Uk = UE — X Tk iui 
i=1 


where u; = v and 


0 if Uj =0 
Tki = Uk Ui x 
HO y fuo 


Then O is an orthogonal sequence in V with the property that 
(Ui... Uk) = (U1,..., Uk) 


for all k > 0. Also, up = 0 ifand only if vp € (v1, ... , Uk—1). 
Proof. The result holds for k= 1. Assume it holds for k—1. If 
Uk E (V1, ..., Uk—-1), then 


Uk E (U1, +++) Uk—1) = (U1, -< , Uk—1) 


Writing 


Real and Complex Inner Product Spaces 215 


k-1 
Uk = ò Qiùi 
i=1 


we have 


(ou) = {0 ifu; = 0 


a;(ui,u;) ifu; #0 
Therefore, a; = rg i when u; Æ 0 and so ug = 0. Hence, 
(U1,---;Uk) = (Uisce Uk-1,0) = (U1,..., Uk-1) = (U1, -23 UR) 
If vug ¢ (v1, ..., Uk—-1) then 
(t1,---;Uk) = (U1,---,Ug—-1, Uk) = (U1, s3 UR—1, Uk) Oo 


Example 9.4 Consider the inner product space R[x] of real polynomials, with 
inner product defined by 


Applying the Gram-Schmidt process to the sequence B = (1,2, 2x?,x?,...) 
gives 


u(x) =l 
1 
d 
N TEE = a 
I 
fdz 
1 
uz(x) = r’ J i = Le as a : 
fdz fiz dx 3 
(z) = z fiz? dz fiat da l {2,2 (@—-})dax l G d 
[ae fiz dx fi, (@?-1)de 3 
3 
=7?— =e 
5 


and so on. The polynomials in this sequence are (at least up to multiplicative 
constants) the Legendre polynomials. O 


The QR Factorization 


The Gram-Schmidt process can be used to factor any real or complex matrix 
into a product of a matrix with orthogonal columns and an upper triangular 
matrix. Suppose that A = (vı | v2 |---| up) is an m x n matrix with columns 
vi, Where n < m. The Gram-Schmidt process applied to these columns gives 
orthogonal vectors O = (u1 | u2 | +++ | un) for which 


216 Advanced Linear Algebra 


(ui,---;Uk) = (U1,--., Uk) 


for all k < n. In particular, 
k-1 
Uk = UR + 5 Tk iui 
i=1 


where 


Thi = (vp ,Ui) if uj Z 0 


(uiui) 


In matrix terms, 


1 r21 Tn,1 
1 r 
(vi | v2 |+ | Un) = (ur | ue | | un) me 
1 


that is, A = OB where O has orthogonal columns and B is upper triangular. 
We may normalize the nonzero columns u; of O and move the positive 
constants to B. In particular, if a; = ||u;|| for u; A 0 and a; = 1 for u; = 0, then 


Qa Areat cts) AITnA 
ui | U2 U a2 An 
(lale lu= (2 ejt) ' fn, 
ai a2 An 
an 
and so 
A=QR 


where the columns of Q are orthogonal and each column is either a unit vector 
or the zero vector and R is upper triangular with positive entries on the main 
diagonal. Moreover, if the vectors v;,..., vu, are linearly independent, then the 
columns of Q are nonzero. Also, if m =n and A is nonsingular, then Q is 
unitary/orthogonal. 


If the columns of A are not linearly independent, we can make one final 
adjustment to this matrix factorization. If a column u;/a; is zero, then we may 
replace this column by any vector as long as we replace the (i, 7)th entry a; in R 
by 0. Therefore, we can take nonzero columns of Q, extend to an orthonormal 
basis for the span of the columns of Q and replace the zero columns of Q by the 
additional members of this orthonormal basis. In this way, Q is replaced by a 
unitary/orthogonal matrix Q’ and R is replaced by an upper triangular matrix R’ 
that has nonnegative entries on the main diagonal. 


Real and Complex Inner Product Spaces 217 


Theorem 9.12 Let A E€ Mm n(F), where F =C or F =R. There exists a 
matrix Q E€ Mm n(F) with orthonormal columns and an upper triangular 
matrix R E€ M,,(F) with nonnegative real entries on the main diagonal for 
which 


A=QR 


Moreover, ifm =n, then Q is unitary/orthogonal. If A is nonsingular, then R 
can be chosen to have positive entries on the main diagonal, in which case the 
factors Q and R are unique. The factorization A = QR is called thee QR 
factorization of the matrix A. If A is real, then Q and R may be taken to be 
real. 

Proof. As to uniqueness, if A is nonsingular and QR = Q; R; then 


QTQ = RR! 


and the right side is upper triangular with nonzero entries on the main diagonal 
and the left side is unitary. But an upper triangular matrix with positive entries 
on the main diagonal is unitary if and only if it is the identity and so Qı = Q 
and Rı = R. Finally, if A is real, then all computations take place in the real 
field and so Q and R are real.O 


The QR decomposition has important applications. For example, a system of 
linear equations Ax = u can be written in the form 


QRr=u 
and since Q7! = Q*, we have 
Raz = Q*u 
This is an upper triangular system, which is easily solved by back substitution; 


that is, starting from the bottom and working up. 


We mention also that the QR factorization is associated with an algorithm for 
approximating the eigenvalues of a matrix, called the QR algorithm. 
Specifically, if A = Ap is an n x n matrix, define a sequence of matrices as 
follows: 


1) Let Ap = Qo Ro be the QR factorization of Ag and let Ay = RoQo. 
2) Once A; has been defined, let A, = Qk Rp be the QR factorization of Ax 
and let Akı = RkQk- 


Then A; is unitarily/orthogonally similar to A, since 


Qi tA, 4 = Oi ae a0, 4 = Qk-1Rk-1 = Aga 


For complex matrices, it can be shown that under certain circumstances, such as 
when the eigenvalues of A have distinct norms, the sequence A; converges 


218 Advanced Linear Algebra 


(entrywise) to an upper triangular matrix U, which therefore has the eigenvalues 
of A on its main diagonal. Results can be obtained in the real case as well. For 
more details, we refer the reader to [48], page 115. 


Hilbert and Hamel Bases 


Definition 4 maximal orthonormal set in an inner product space V is called a 
Hilbert basis for V.O 


Zorn's lemma can be used to show that any nontrivial inner product space has a 
Hilbert basis. We leave the details to the reader. 


Some care must be taken not to confuse the concepts of a basis for a vector 
space and a Hilbert basis for an inner product space. To avoid confusion, a 
vector space basis, that is, a maximal linearly independent set of vectors, is 
referred to as a Hamel basis. We will refer to an orthonormal Hamel basis as an 
orthonormal basis. 


To be perfectly clear, there are maximal linearly independent sets called 
(Hamel) bases and maximal orthonormal sets (called Hilbert bases). If a 
maximal linearly independent set (basis) is orthonormal, it is called an 
orthonormal basis. 


Moreover, since every orthonormal set is linearly independent, it follows that an 
orthonormal basis is a Hilbert basis, since it cannot be properly contained in an 
orthonormal set. For finite-dimensional inner product spaces, the two types of 
bases are the same. 


Theorem 9.13 Let V be an inner product space. A finite subset 
O = {u1,..., ux} of V is an orthonormal (Hamel) basis for V if and only if it is 
a Hilbert basis for V. 

Proof. We have seen that any orthonormal basis is a Hilbert basis. Conversely, 
if O is a finite maximal orthonormal set and O C P, where P is linearly 
independent, then we may apply part 1) to extend O to a strictly larger 
orthonormal set, in contradiction to the maximality of O. Hence, O is maximal 
linearly independent. O 


The following example shows that the previous theorem fails for infinite- 
dimensional inner product spaces. 
Example 9.5 Let V = /? and let M be the set of all vectors of the form 

e= (0; 425 0, 10,44) 


where e; has a 1 in the ith coordinate and O's elsewhere. Clearly, M is an 
orthonormal set. Moreover, it is maximal. For if v = (a,,) € Æ has the property 
that v L M, then 


Real and Complex Inner Product Spaces 219 


Ti = (v, ei) = 0 
for all i and so v = 0. Hence, no nonzero vector v ¢ M is orthogonal to M. 


This shows that M is a Hilbert basis for the inner product space 8. 


On the other hand, the vector space span of M is the subspace S of all 
sequences in ¢ that have finite support, that is, have only a finite number of 
nonzero terms and since span( M) = S # Ê, we see that M is not a Hamel 
basis for the vector space (7.0 


The Projection Theorem and Best Approximations 


Orthonormal bases have a great practical advantage over arbitrary bases. From a 
computational point of view, if B = {v1,..., Un} is a basis for V, then each 
v € V has the form 


V = T1V1 + FT nn 
In general, determining the coordinates r; requires solving a system of linear 
equations of size n x n. 
On the other hand, if O = {u1,..., Un } is an orthonormal basis for V and 
V = TiU tees + TnUn 
then the coefficients r; are quite easily computed: 
(v, ui) = (rir t+ atin, Ui) = rili, Ui) = ri 


Even if O = {u1,..., Un} is not a basis (but just an orthonormal set), we can 
still consider the expansion 


= (v, uyur apers (v, Un) Un 
Theorem 9.14 Let O = {u,..., uz} be an orthonormal subset of an inner 


product space V and let S = (O). The Fourier expansion with respect to O of 
a vector v € V is 


Ò = (v, ujur +++ + (V, UE) UK 
Each coefficient (v, u;) is called a Fourier coefficient of v with respect to O. 
The vector Ù can be characterized as follows: 
1) Dis the unique vector s € S for which (v — s) L S. 


2) Wis the best approximation to v from within S, that is, Ù is the unique 
vector s € S that is closest to v, in the sense that 


lv =al] < llv = sll 


forall s € S \ {ò}. 


220 Advanced Linear Algebra 


3) Bessel's inequality holds for all v € V, that is 
laI] < lloll 
Proof. For part 1), since 
(v —B, ui) = (v, ui) — @, uj) = 0 
it follows that v — ò € S+. Also, ifv — s € S+ for s € S, then s — ò € S and 


s — = (v — D) — (v — s) € S+ 


and so s=%. For part 2), if seS, then v-e S+ implies that 
(v—%) L @-— s) and so 
lv- sl? = llv -3+3 - sl? = lv —9]?? + IP- sll? 


Hence, ||v— s|| is smallest if and only if s =ò and the smallest value is 
|v — ||. We leave proof of Bessel's inequality as an exercise. 


Theorem 9.15 (The projection theorem) /f S is a finite-dimensional subspace 
of an inner product space V, then 


S=50S+t 
In particular, ifv € V, then 
v=d4+(v-d) ESOS 
It follows that 
dim(V) = dim(S) + dim( S+) 


Proof. We have seen that v — ò € S+ and so V = S + S+. But S N S+ = {0} 
and so V = S © S+.0 


The following example shows that the projection theorem may fail if S is not 
finite-dimensional. Indeed, in the infinite-dimensional case, S must be a 
complete subspace, but we postpone a discussion of this case until Chapter 13. 


Example 9.6 As in Example 9.5, let V = @ and let S be the subspace of all 
sequences with finite support, that is, S' is spanned by the vectors 


ei = (0,...,0,1,0,...) 


where e; has a 1 in the ith coordinate and 0's elsewhere. If x = (xn) € S+, then 
x; = (x,e;) = 0 for all i and so x = 0. Therefore, S+ = {0}. However, 


Sost=sS# Oo 


The projection theorem has a variety of uses. 


Real and Complex Inner Product Spaces 221 


Theorem 9.16 Let V be an inner product space and let S be a finite- 


dimensional subspace of V. 
i) S =s 
2) If X CV and dim((X)) < o, then 


x = (X) 
Proof. For part 1), it is clear that S C S++. On the other hand, if v € S++, then 
the projection theorem implies that v = s + s’ where s € S and s’ € S+. Then 


s' is orthogonal to both s and v and so s’ is orthogonal to itself. Hence, s’ = 0 
and v = s € S and so S = S++., We leave the proof of part 2) as an exercise. O 


Characterizing Orthonormal Bases 
We can characterize orthonormal bases using Fourier expansions. 
Theorem 9.17 Let O = {u;,... up} be an orthonormal subset of an inner 


product space V and let S = (O). The following are equivalent: 
1) © is an orthonormal basis for V. 


2) (O)~ = {0} 


3) Every vector is equal to its Fourier expansion, that is, for allv € V, 
=v 
4) Bessel's identity holds for all v € V, that is, 
[I = lvl 


5) Parseval's identity holds for all v, w € V, that is, 


~ 


(v, w) = Plo : [ao 


where 


Plo: [@lo = (v, u1) (w, u1) ++: + (v, Ux) (w, ur) 


is the standard dot product in F*. 
Proof. To see that 1) implies 2), if v € (O)+ is nonzero, then O U {v/|lv||} is 
orthonormal and so O is not maximal. Conversely, if O is not maximal, there is 
an orthonormal set P for which O C P. Then any nonzero v € P \ O is in 
(O)+. Hence, 2) implies 1). We leave the rest of the proof as an exercise. O 


The Riesz Representation Theorem 


We have been dealing with linear maps for some time. We now have a need for 
conjugate linear maps. 


Definition 4 function o: V — W on complex vector spaces is conjugate linear 
if it is additive, 


a(v, + v2) = ov, + ov 


222 Advanced Linear Algebra 


and 
a(rv) = Tov 


for allr € C. A conjugate isomorphism is a bijective conjugate linear map. O 


If « € V, then the inner product function ( - , x): V — F defined by 
(zju = (v,a) 
is a linear functional on V. Thus, the linear map 7: V — V* defined by 
TH =(-,2) 


is conjugate linear. Moreover, since (-,x) = (-,y) implies x = y, it follows 
that 7 is injective and therefore a conjugate isomorphism (since V is finite- 
dimensional). 


Theorem 9.18 (The Riesz representation theorem) Let V be a finite- 
dimensional inner product space. 
1) Themapt:V — V* defined by 


Tx =(-,2) 


is a conjugate isomorphism. In particular, for each f € V*, there exists a 
unique vector x € V for which f = (- , x), that is, 


fu= (v, x) 


for allv € V. We call x the Riesz vector for f and denote it by Ry. 
2) The map R: V* — V defined by 


Rf=R; 


is also a conjugate isomorphism, being the inverse of T. We will call this 
map the Riesz map. 
Proof. Here is the usual proof that 7 is surjective. If f = 0, then Ry = 0, so let 
us assume that f # 0. Then K = ker( f) has codimension 1 and so 


V = (wW OK 
for w € K+. Letting x = aw for a € F, we require that 
f(v) = (v, aw) 


and since this clearly holds for any v € K, it is sufficient to show that it holds 
for v = w, that is, 


f(w) = (uw, aw) = au, w) 


Thus, œ = f(w)/||w||” and 


Real and Complex Inner Product Spaces 223 


_ fw) 
f= pe 


lwll? 


For part 2), we have 


(v, Rrf+sg) = (rf + s9) (v) 
rf(v) + sg(v) 

(u,FRy) + (v,3R,) 
= (v, TREF SRy) 


for all v € V and so 


Rrpisg = FRf + 3R, o 


Note that if V = R”, then Ry = (f (e1), ..., f(en)), where (e1,..., en) is the 


standard basis for R”. 


n 


Exercises 


1. 


2. 


3. 


aa 


10. 
. Prove that an orthonormal set O is a Hilbert basis for a finite-dimensional 


12. 


Prove that if a matrix M is unitary, upper triangular and has positive entries 
on the main diagonal, must be the identity matrix. 

Use the QR factorization to show that any triangularizable matrix is 
unitarily (orthogonally) triangularizable. 

Verify the statement concerning equality in the triangle inequality. 

Prove the parallelogram law. 

Prove the Apollonius identity 


2 
1 1 
2 2 
o-u? + jw- ol = Zle- ol? +o- Su + 2) 


Let V be an inner product space with basis B. Show that the inner product 
is uniquely defined by the values (u,v), for all u,v € B. 

Prove that two vectors u and v in a real inner product space V are 
orthogonal if and only if 


2 2 2 
lu + oll” = [lull + lle 


Show that an isometry is injective. 

Use Zorn's lemma to show that any nontrivial inner product space has a 
Hilbert basis. 

Prove Bessel's inequality. 


vector space V if and only if % = v, for all v € V. 

Prove that an orthonormal set © is a Hilbert basis for a finite-dimensional 
vector space V if and only if Bessel's identity holds for all v € V, that is, if 
and only if 


224 


13. 


14. 


15. 


16. 


17. 
18. 


19. 


20. 


Advanced Linear Algebra 
lall = lloll 
forallue V. 
Prove that an orthonormal set O is a Hilbert basis for a finite-dimensional 


vector space V if and only if Parseval's identity holds for all v, w € V, that 
is, if and only if 


(v, w) = Plo - [@Jo 


forall v, w E€ V. 
Let u = (r1,..., fn) and v = (s1,...,8,) be in R”. The Cauchy—Schwarz 
inequality states that 


[risa tee tt tnsal’ S (i Eras ++ + 85) 
Prove that we can do better: 
(Irisi +++ + [tnSnl)? < (ri +e rasi +++ +85) 


Let V be a finite-dimensional inner product space. Prove that for any subset 
X of V, we have X++ = span( X). 

Let P3 be the inner product space of all polynomials of degree at most 3, 
under the inner product 


Apply the Gram-Schmidt process to the basis {1, x, x?, x°}, thereby 
computing the first four Hermite polynomials (at least up to a 
multiplicative constant). 

Verify uniqueness in the Riesz representation theorem. 

Let V be a complex inner product space and let S be a subspace of V. 
Suppose that v € V is a vector for which (v, s} + (s,v) < (s,s) for all 
s € S. Prove that v € S+. 

If V and W are inner product spaces, consider the function on V H W 
defined by 


((v1, w1), (v2, W2)) = (v1, v2) + (w1, w2) 


Is this an inner product on V H W? 

A normed vector space over R or C is a vector space (over R or C) 
together with a function |||]: V — R for which for all u,v € V and scalars r 
we have 

a) |jrol| = [rillel 

b) ju + oll < [lull + lol 

c) |u|] = 0 if and only if v = 0 

If V is a real normed space (over R) and if the norm satisfies the 
parallelogram law 


Real and Complex Inner Product Spaces 225 


lu + olf? + lju — vl? = 2llull? + 2v]? 


prove that the polarization identity 
1 2 
(u,0) = Flle + vl? - lu = vl) 


defines an inner product on V. Hint: Evaluate 8(u, £} + 8(v,x) to show 
that (u, 2x) = 2u, x) and (u, x) + (v, x} = (u + v, x}. Then complete the 
proof that (u, rx) = r(u, £). 

21. Let S be a subspace of a finite-dimensional inner product space V. Prove 
that each coset in V / S contains exactly one vector that is orthogonal to S. 


Extensions of Linear Functionals 


22. Let f be a linear functional on a subspace S of a finite-dimensional inner 
product space V. Let f(v) = (v, Rr). Suppose that g € V* is an extension 
of f, that is, gls = f. What is the relationship between the Riesz vectors Rf 
and R,? 

23. Let f be a nonzero linear functional on a subspace S of a finite-dimensional 
inner product space V and let K = ker( f). Show that if g € V* is an 
extension of f, then R,¢€K+\S*. Moreover, for each vector 
u € K+ \ S+ there is exactly one scalar À for which the linear functional 
g(X) = (X, Xu) is an extension of f. 


Positive Linear Functionals on R” 


A vector v = (a1,...,@,) in R” is nonnegative (also called positive), written 
v > 0, if a; > 0 for all i. The vector v is strictly positive, written v > 0, if v is 
nonnegative but not 0. The set R’; of all strictly positive vectors in R” is called 
the nonnegative orthant in R”. The vector v is strongly positive, written 
v > 0, if a; > 0 for all 7. The set Rẹ ,, of all strongly positive vectors in IR” is 
the strongly positive orthant in R”. 


Let f:S — R be a linear functional on a subspace S of R”. Then f is 
nonnegative (also called positive), written f > 0, if 
v>0=> f(v) >0 
for all v € S and f is strictly positive, written f > 0, if 
v>0 = f(v) >0 
for allu € S. 
24. Prove that a linear functional f on R” is positive if and only if Ry > 0 and 


strictly positive if and only if Ry > 0. If S is a subspace of R” is it true 
that a linear functional f on S is nonnegative if and only if Ry > 0? 


226 Advanced Linear Algebra 


25. Let f: S — R be a strictly positive linear functional on a subspace S of R”. 
Prove that f has a strictly positive extension to R”. Use the fact that if 
U OR? = {0}, where 


R” = {(a1,..-,@n) | Qi = 0 all 7} 


and U is a subspace of R”, then U+ contains a strongly positive vector. 

26. If V is a real inner product space, then we can define an inner product on its 
complexification V€ as follows (this is the same formula as for the ordinary 
inner product on a complex vector space): 


(u +vi, x + yi) = (u, x£) + (v, y) + ({v, x) = (u, y))i 
Show that 
l + vi) |]? = llul? + Joll? 


where the norm on the left is induced by the inner product on V© and the 
norm on the right is induced by the inner product on V. 


Chapter 10 
Structure Theory for Normal Operators 


Throughout this chapter, all vector spaces are assumed to be finite-dimensional 
unless otherwise noted. Also, the field F is either R or C. 


The Adjoint of a Linear Operator 


The purpose of this chapter is to study the structure of certain special types of 
linear operators on finite-dimensional real and complex inner product spaces. In 
order to define these operators, we introduce another type of adjoint (different 
from the operator adjoint of Chapter 3). 


Theorem 10.1 Let V and W be finite-dimensional inner product spaces over F 
and let T € L(V,W). Then there is a unique function t*:W — V, defined by 
the condition 


(rv, w) = (v, T*w) 


for allv € V and w € W. This function is in L(W ,V ) and is called the adjoint 
Of T. 


Proof. If 7* exists, then it is unique, for if 
(rv, w) = (v, ow) 


then (v, cw) = (v,7*w) for all v and w and so o = T*. 


We seek a linear map 7*: W — V for which 
(v, Tw) = (Tv, w) 


By way of motivation, the vector T*w, if it exists, looks very much like a linear 
map sending v to (rv, w). The only problem is that 7*v is supposed to be a 
vector, not a linear map. But the Riesz representation theorem tells us that linear 
maps can be represented by vectors. 


228 Advanced Linear Algebra 


Specifically, for each w € W, the linear functional f,, € V* defined by 
fwv = (Tv, w) 
has the form 
fuv = (v, Rj) 
where Rs, € V is the Riesz vector for fw. If 7*: W — V is defined by 
rw = Ry, = R(fw) 
where R is the Riesz map, then 
(uT w) = (v, Rg) = fuv = trow) 


Finally, since 7* = Ro f is the composition of the Riesz map R and the map 
f: wt fw and since both of these maps are conjugate linear, their composition 
is linear. O 


Here are some of the basic properties of the adjoint. 


Theorem 10.2 Let V and W be finite-dimensional inner product spaces. For 
every o, T € L(V,W) andr EF, 

D (o +rT) =0*+rT* 

2) (rr) =r 


3) T* = T and so 


4) IfV =W, then (oT) = T*0* 
5) Ifr is invertible, then (r~1)* 
6) IfV =W and p(x) € R|zx], then p(T)* = p(T*). 


Moreover, ifr € L(V) and S is a subspace of V, then 
7) S is T-invariant if and only if S+ is T*-invariant. 
8) (S,S+) reduces T if and only if S is both T-invariant and T*-invariant, in 
which case 
(rls) = ("Dig 
Proof. For part 7), let s € S and z € S+ and write 


(1*z, 8) = (z,TS) 


Now, if S is T-invariant, then (7*z,s) = 0 for all s € S and so r*z € S+ and 
S- is r*-invariant. Conversely, if S+ is r*-invariant, then (z,7s) = 0 for all 
z € SŁ and so rs € S+ = S, whence S is T-invariant. 


The first statement in part 8) follows from part 7) applied to both S and S+. For 
the second statement, since S is both 7-invariant and 7*-invariant, if s,t € S, 


Structure Theory for Normal Operators 229 


then 


(8,(7")|s(t)) = (8,7°t) = (78, t) = (7|s(8), t) 


Now let us relate the kernel and image of a linear transformation to those of its 
adjoint. 


Theorem 10.3 Let r € L(V, W), where V and W are finite-dimensional inner 
product spaces. 


D) 

ker(r*) = im(r)} and im(r*) = ker(r)+ 

and so 
T surjective <> T° injective 
T injective <= T“ surjective 

2) 

ker(7*r) = ker(T) and ker(rr*) = ker(r7*) 
3) 

im(T*T) =im(7*) and im(rr*) = im(r) 

4) 


(ps,r)" = prise 
Proof. For part 1), 
u € ker(7*) & rřu = 0 

= (r*u, V) = {0} 

<> (u, TV) = {0} 

& u €im(r)* 
and so ker(7*) = im(r)+. The second equation in part 1) follows by replacing 7 
by 7* and taking complements. 


For part 2), it is clear that ker(7) C ker(7*7). For the reverse inclusion, we have 
Tmu=0 => (rruu)=0 > (tu,tu)=0 > TU=0 


and so ker(r*r) Cker(r). The second equation follows from the first by 
replacing 7 with 7*. We leave the rest of the proof for the reader. 


230 Advanced Linear Algebra 


The Operator Adjoint and the Hilbert Space Adjoint 


We should make some remarks about the relationship between the operator 
adjoint 7* of 7, as defined in Chapter 3 and the adjoint 7* that we have just 
defined, which is sometimes called the Hilbert space adjoint. In the first place, 
ifr: V — W, then r* and 7* have different domains and ranges: 


T*:W* = V* and 7*:W—-V 


The two maps are shown in Figure 10.1, along with the conjugate Riesz 
isomorphisms RY: V* — V and RY: W* > W. 


Figure 10.1 
The composite map o: W* — V* defined by 
o = (RY) o 7* o RW 
is linear. Moreover, for all f €e W* and v E V, 


(7*(f))v = F(T) 


and so g = T*. Hence, the relationship between 7T* and 7* is 
T* = (RY)! otto RW 


Loosely speaking, the Riesz functions are like “change of variables” functions 
from linear functionals to vectors, and we can say that 7* does to Riesz vectors 
what T* does to the corresponding linear functionals. Put another way (and just 
as loosely), 7 and 7* are the same, up to conjugate Riesz isomorphism. 


In Chapter 3, we showed that the matrix of the operator adjoint 7% is the 
transpose of the matrix of the map 7. For Hilbert space adjoints, the situation is 
slightly different (due to the conjugate linearity of the inner product). Suppose 
that B = (b1,...,b6,) and C = (c1, ...,Cm) are ordered orthonormal bases for V 
and W, respectively. Then 


Structure Theory for Normal Operators 231 


(I7*]e,B)i,g = (T*cj bi) = (cj, Tbi) = (Tbi, cj) = ([7] 8,0) 3, 
and so [7*]c.z and [r]g c are conjugate transposes. The conjugate transpose of a 
matrix A = (a;,;) is 
At = (G3) 
and is called the adjoint of A. 
Theorem 10.4 Let r € L(V, W), where V and W are finite-dimensional inner 


product spaces. 
1) The operator adjoint T* and the Hilbert space adjoint T* are related by 


T* = (RY)! otto RW 
where RY and RV are the conjugate Riesz isomorphisms on V and W, 


respectively. 
2) If B and C are ordered orthonormal bases for V and W, respectively, then 


[r"]es = ([rlec)” 


In words, the matrix of the adjoint T* is the adjoint (conjugate transpose) of 
the matrix of T.O 


Orthogonal Projections 


In an inner product space, we can single out some special projection operators. 


Definition A projection of the form pgs: is said to be orthogonal. 
Equivalently, a projection p is orthogonal if ker(p) L im(p).0 


Some care must be taken to avoid confusion between orthogonal projections and 
two projections that are orthogonal to each other, that is, for which 
po =op =0. 


We have seen that an operator p is a projection operator if and only if it is 
idempotent. Here is the analogous characterization of orthogonal projections. 


Theorem 10.5 Let V be a finite-dimensional inner product space. The following 
are equivalent for an operator p on V: 

1) pis an orthogonal projection 

2) pis idempotent and self-adjoint 

3) pis idempotent and does not expand lengths, that is 


Ilpull < lloll 
forallve V. 


232 Advanced Linear Algebra 


Proof. Since 
(ps,r)" = prise 
it follows that p = p* if and only if S+ = T, that is, if and only if p is 
orthogonal. Hence, 1) and 2) are equivalent. 
To prove that 1) implies 3), let p = pg s1. Then if v= s +t for s € S and 
t € S+, it follows that 
lel? = lis? + all? = Isl? = lee? 
Now suppose that 3) holds. Then 
im(p) @ ker(p) = V = ker(p)~ © ker(p) 


and we wish to show that the first sum is orthogonal. If w € im(p), then 
w = x + y, where x € ker(p) and y € ker(p)+. Hence, 


w = pw = p£ + py = py 
and so the orthogonality of x and y implies that 
lll? + lyh? = lwl? = leul? < lul? 
Hence, x = 0 and so im(p) C ker(p)+, which implies that im(p) = ker(p)+.0 
Orthogonal Resolutions of the Identity 
We have seen (Theorem 2.25) that resolutions of the identity 
Pitot peat 
on V correspond to direct sum decompositions of V. If, in addition, the 


projections are orthogonal, then the direct sum is an orthogonal sum. 


Definition An orthogonal resolution of the identity is a resolution of the 
identity pı +--+: + pk = | in which each projection p; is orthogonal. O 


The following theorem displays a correspondence between orthogonal direct 
sum decompositions of V and orthogonal resolutions of the identity. 


Theorem 10.6 Let V be an inner product space. Orthogonal resolutions of the 
identity on V correspond to orthogonal direct sum decompositions of V as 
follows: 

1) fp + + pr = is an orthogonal resolution of the identity, then 


V = im(p1) © --- © im(px) 


and p; is orthogonal projection onto im(p;). 


Structure Theory for Normal Operators 233 


2) Conversely, if 
V=S,0::-OS; 


and if p; is orthogonal projection onto S;, then pı +++++ pk =e is an 
orthogonal resolution of the identity. 
Proof. To prove 1), if pı +---+ px =v is an orthogonal resolution of the 
identity, Theorem 2.25 implies that 


V =im(p1) ® --- ®© im(px) 
However, since the p;'s are pairwise orthogonal and self-adjoint, it follows that 
(piv, pjw) = (v, pipjw) = (v,0) = 0 
and so 
V =im(1) © = © im(px) 
For the converse, Theorem 2.25 implies that p1 + --- + pk = is a resolution of 
the identity where p; is projection onto im(p;) along 
ker(p;) = ©) im( pj) =im(p;)+ 
j#i 
Hence, p; is orthogonal. O 
Unitary Diagonalizability 
We have seen (Theorem 8.10) that a linear operator rT € L(V) on a finite- 
dimensional vector space V is diagonalizable if and only if 
V =E D DEn 


Of course, each eigenspace €), has an orthonormal basis O;, but the union of 
these bases need not be an orthonormal basis for V. 


Definition A linear operator T € L(V) is unitarily diagonalizable (when V is 
complex) and orthogonally diagonalizable (when V is real) if there is an 
ordered orthonormal basis O = (u1,...,Un) of V for which the matrix [r]o is 
diagonal, or equivalently, if 


TU, = Aill; 


foralli =1,...,n.0 
Here is the counterpart of Theorem 8.10 for inner product spaces. 


Theorem 10.7 Let V be a finite-dimensional inner product space and let 
T € L(V). The following are equivalent: 

1) 7 is unitarily (orthogonally) diagonalizable. 

2) V has an orthonormal basis that consists entirely of eigenvectors of T. 


234 Advanced Linear Algebra 


3) V has the form 
V=&,0:::O&, 
where 1,..., Ax are the distinct eigenvalues of T.O 
For simplicity in exposition, we will tend to use the term unitarily 
diagonalizable for both cases. Since unitarily diagonalizable operators are so 


well behaved, it is natural to seek a characterization of such operators. 
Remarkably, there is a simple one, as we will see next. 


Normal Operators 
Operators that commute with their own adjonts are very special. 
Definition 


1) A linear operator T on an inner product space V is normal if it commutes 
with its adjoint: 


TT = TT 


2) A matrix A E€ M,(F) is normal if A commutes with its adjoint A*.O 


If 7 is normal and O is an ordered orthonormal basis of V, then 
Ilolo = [rJo[r*]o = [rr"]o 

and 
[Tlolz]o = [""Jolrlo = [r*z]o 


and so 7 is normal if and only if [r]o is normal for some, and hence all, 
orthonormal bases for V. Note that this does not hold for bases that are not 
orthonormal. 


Normal operators have some very special properties. 


Theorem 10.8 Let r € L(V) be normal. 
1) The following are also normal: 

a) T\s, ifT reduces (S, S+) 

b) T* 

c) 71, ifr is invertible 

d) p(T), for any polynomial p(x) € Fa] 
2) Foranyv,w € V, 


(rv, Tw) = (T*v, T*w) 
and, in particular, 


rol] = ivl 


Structure Theory for Normal Operators 235 


and so 
ker(7*) = ker(r) 
3) For any integer k > 1, 
ker(r*) = ker(r) 


4) The minimal polynomial m,(x) is a product of distinct prime monic 
polynomials. 
5) 


TU = dv > c*y=dhv 
6) IfS and T are submodules of V, with relatively prime orders, then S L T. 


7) If and u are distinct eigenvalues of T, then Ey L En. 
Proof. We leave part 1) for the reader. For part 2), normality implies that 


(Tv, TW) = (T* Tu, v} = (TT*v, v) = (T*v, Tv) 
We prove part 3) first for the operator o = T*T, which is self-adjoint, that is, 
o= (T Ty =r T Se 
If ořv = 0 for k > 1, then 
0 = (ov, °v) = (otu, ow) 


k= 


and so o*™tv = 0. Continuing in this way gives øv = 0. Now, if 7v = 0 for 


k > 1, then 


and so ov = 0. Hence, 


and so Tv = 0. 


For part 4), suppose that 
m(x) = p°(x)q(x) 
where p(x) is monic and prime. Then for any v € V, 
P’(7)la(7)v] = 0 
and since p(T) is also normal, part 3) implies that 
p(7)la(7)v] = 0 


for all v € V. Hence, p(r)q(T) = 0, which implies that e = 1. Thus, the prime 
factors of m,(a) appear only to the first power. 


236 Advanced Linear Algebra 


Part 5) follows from part 2): 
ker(r — À) = ker[(r — \)*] = ker(7* — A) 
For part 6), if o(S) = p(x) and o(T) = q(x), then there are polynomials a(x) 
and b(x) for which a(2)p(a) + b(a)q(x) = 1 and so 
a(T)p(T) + W(7)q(7) = 4 


Now, a = a(T)p(T) annihilates S and 8 = b(r)q(r) annihilates T. Therefore 
G* also annihilates T and so 


(S,T) = ((a+ 8)S,T) = (95,7) = (S, 8T) = {0} 


Part 7) follows from part 6), since o(E£\) =x — à and o(€,) =x — u are 

relatively prime when \ Æ ju. Alternatively, for v € E£) and w € €,,, we have 
Alw, w) = (Tv, w) = (v, Tw) = (v, pw) = ulv, w) 

and so À # p implies that (v, w) = 0.0 

The Spectral Theorem for Normal Operators 


Theorem 10.8 implies that when F = C, the minimal polynomial m,(x) splits 
into distinct linear factors and so Theorem 8.11 implies that 7 is diagonalizable, 
that is, 


V; = Eu D DEn 
Moreover, since distinct eigenspaces of a normal operator are orthogonal, we 
have 


V =E O OE 


and so 7 is unitarily diagonalizable. 


The converse of this is also true. If V has an orthonormal basis O = 
{U1,---;Un} of eigenvectors for 7, then since [r]o and [r*]o = [r]ġ are 
diagonal, these matrices commute and therefore so do 7* and 7. 


Theorem 10.9 (The spectral theorem for normal operators: complex case) 
Let V be a finite-dimensional complex inner product space and let r € L(V). 
The following are equivalent: 

1) Tis normal. 

2) Tis unitarily diagonalizable, that is, 


V, = Ex O O En 


Structure Theory for Normal Operators 237 


3) thas an orthogonal spectral resolution 
T = Mp1 t+: +Anpr (10.1) 


where pı +:-:+ pn =e and pi is orthogonal for all i, in which case, 
{A1; ---, Ag} is the spectrum of T and 


im(p;) =E, and ker(p;) = OS, 
j#i 
Proof. We have seen that 1) and 2) are equivalent. To see that 2) and 3) are 
equivalent, Theorem 8.12 says that 
V, = Eu @: BE, 
if and only if 
T = Api +--+ + Àkpk 
and in this case, 
im(p;) = €,, and ker(p;) = Dé, 
J#i 
But €), L Ex, for i # j if and only if 
im(p;) L ker(p;) 


that is, if and only if each p; is orthogonal. Hence, the direct sum V, = 
Ey, DDE is an orthogonal sum if and only if each projection is 
orthogonal. O 

The Real Case 

If F = R, then m, (x) has the form 


m(x) = (£ => A1)++*(@ = Ak) P(T) Pm(T) 


where each p;(x) is an irreducible monic quadratic. Hence, the primary cyclic 
decomposition of V, gives 


V, =E O OEL OWO- OWm 


where W; is cyclic with prime quadratic order p;(x). Therefore, Theorem 8.8 
implies that there is an ordered basis B; for which 


7 —b; 
ITlw]e; = E l | 


Qi 


Theorem 10.10 (The spectral theorem for normal operators: real case) A 
linear operator T on a finite-dimensional real inner product space is normal if 
and only if 


238 Advanced Linear Algebra 


V =E 0 OEL OWO- OWm 


where {X1,..., Ax} is the spectrum of T and each W; is an indecomposable two- 
dimensional T-invariant subspace with an ordered basis B; for which 


a= [5 


Proof. We need only show that if V has such a decomposition, then 7 is normal. 
But 


(rlaltls, = (a; +b?) = [r}5.[7Ie, 
and so [r]z, is normal. It follows easily that r is normal. O 


Special Types of Normal Operators 


We now want to introduce some special types of normal operators. 


Definition Let V be an inner product space. 
1) 7 € L(V) is self-adjoint (also called Hermitian in the complex case and 
symmetric in the real case) if 


T =T 


2) TEL(V) is skew self-adjoint (also called skew-Hermitian in the 
complex case and skew-symmetric in the real case) if 


T =-T 


3) + € L(V) is unitary in the complex case and orthogonal in the real case if 
T is invertible and 


T* =r! O 


There are also matrix versions of these definitions, obtained simply by replacing 
the operator 7 by a matrix A. Moreover, the operator 7 is self-adjoint if and only 
if any matrix that represents 7 with respect to an ordered orthonormal basis O is 
self-adjoint. Similar statements hold for the other types of operators in the 
previous definition. 


In some sense, square complex matrices are a generalization of complex 
numbers and the adjoint (conjugate transpose) is a generalization of the complex 
conjugate. In looking for a better analogy, we could consider just the diagonal 
matrices, but this is a bit too restrictive. The next logical choice is the set M of 
normal matrices. 


Indeed, among the complex numbers, there are some special subsets: the real 
numbers, the positive numbers and the numbers on the unit circle. We will soon 
see that a complex matrix A is self-adjoint if and only if its complex eigenvalues 


Structure Theory for Normal Operators 239 


are real. This would suggest that the analog of the set of real numbers is the set 
of self-adjoint matrices. Also, we will see that a complex matrix is unitary if and 
only if its eigenvalues have norm 1, so numbers on the unit circle seem to 
correspond to the set of unitary matrices. This leaves open the question of which 
normal matrices correspond to the positive real numbers. These are the positive 
definite matrices, which we will discuss later in the chapter. 


Self-Adjoint Operators 


Let us consider the basic properties of self-adjoint operators. The quadratic 
form associated with the linear operator 7 is the function Q,:V — F defined 
by 


Q-(v) = (Tv, v) 


We have seen (Theorem 9.2) that in a complex inner product space, T = 0 if and 
only if Q, = 0 but this does not hold, in general, for real inner product spaces. 
However, it does hold for symmetric operators on a real inner product space. 


Theorem 10.11 Let V be a finite-dimensional inner product space and let 


o,T E€ L(V). 
1) Ift and o are self-adjoint, then so are the following: 
a) o+T 


b) T`}, ifr is invertible 
c) p(T), for any real polynomial p(x) € R|zx] 

2) A complex operator T is Hermitian if and only if Q-(v) is real for all 
veEV. 

3) Ift is a complex operator or a real symmetric operator, then 


F7=0 < -@-=0 


4) The characteristic polynomial c,(«) of a self-adjoint operator T splits over 
R, that is, all complex roots of c (x) are real. Hence, the minimal 
polynomial m,(x) of T is the product of distinct monic linear factors over 
R. 
Proof. For part 2), if r is Hermitian, then 


(rv,0) = (o, ro) = Waa) 
and so Q,(v) = (rv, v) is real. Conversely, if (Tv, v} € R, then 
(vu, Tv) = (Tv, v) = (v, T*v) 


and so T = T*. 


For part 3), we need only prove that Q, = 0 implies 7 = 0 when F = R. But if 
Q, = 0, then 


240 Advanced Linear Algebra 


= (rz, x) + (TY, Y) + (TT, y) + (TY, 2) 
= (Tx, y) F (Ty, x) 

= (T£, Y) + (T, TY) 

= (T£, y) + (TT, Y) 

= 2(rx, y) 


and so T = 0. 


For part 4), if 7 is Hermitian (F = C) and rv = Av, then 
\u=Tv=T*vu=dv 


and so À = À is real. If 7 is symmetric (F = R), we must be a bit careful, since 
a nonreal root of c;(x) is not an eigenvalue of r. However, matrix techniques 
can come to the rescue here. If A = [r]o for any ordered orthonormal basis © 
for V, then c,(x) = c4(x). Now, A is a real symmetric matrix, but can be 
thought of as a complex Hermitian matrix with real entries. As such, it 
represents a Hermitian linear operator on the complex space C” and so, by what 
we have just shown, all (complex) roots of its characteristic polynomial are real. 
But the characteristic polynomial of A is the same, whether we think of A as a 
real or a complex matrix and so the result follows. O 


Unitary Operators and Isometries 


We now tum to the basic properties of unitary operators. These are the 
workhorse operators, in that a unitary operator is precisely a normal operator 
that maps orthonormal bases to orthonormal bases. 


Note that 7 is unitary if and only if 
(rv, w) = (v, Tw) 


forall v,w E€ V. 


Theorem 10.12 Let V be a finite-dimensional inner product space and let 

o,T E€ L(V). 

1) Ift and o are unitary/orthogonal, then so are the following: 

a) rt, forreC,|r| =1 
b) oT 
c) 71, ifr is invertible. 

2) 7 is unitary/orthogonal if and only it is an isometric isomorphism. 

3) 7 is unitary/orthogonal if and only if it takes some orthonormal basis to an 
orthonormal basis, in which case it takes all orthonormal bases to 
orthonormal bases. 

4) Ifr is unitary/orthogonal, then the eigenvalues of T have absolute value 1. 


Structure Theory for Normal Operators 241 


Proof. We leave the proof of part 1) to the reader. For part 2), a 
unitary/orthogonal map is injective and since V is finite-dimensional, it is 
bijective. Moreover, for a bijective linear map 7, we have 


T is an isometry © (Tv, TW) = (v, w) for all v, w € V 
& (v,T Tw) = (v, w) forall v, w € V 
Srr=} 
am Seg 


<> T is unitary/orthogonal 


1 


For part 3), suppose that 7 is unitary/orthogonal and that O = {u,,...,u,} is an 
orthonormal basis for V. Then 


(Tui, TUJ) = (Ui, Uj) = 6; 4 


and so TO is an orthonormal basis for V. Conversely, suppose that © and rO 
are orthonormal bases for V. Then 


(Tui, TU;) = Ôi j = (Ui, Uy) 


which implies that (7v,rw) =(v,w) for all v,weV and so rT is 
unitary/orthogonal. 


For part 4), if 7 is unitary and Tv = Av, then 
AA lw, v) = (Av, Av) = (Tv, Tv) = (v, v) 
and so |A|? = AÀ = 1, which implies that |A| = 1.0 


We also have the following theorem concerning unitary (and orthogonal) 
matrices. 


Theorem 10.13 Let A be an n x n matrix over F = Cor F = R. 
1) The following are equivalent: 

a) Ais unitary/orthogonal. 

b) The columns of A form an orthonormal set in F". 

c) The rows of A form an orthonormal set in F”. 
2) If Ais unitary, then \det(A)| = 1. If A is orthogonal, then det(A) = +1. 
Proof. The matrix A is unitary if and only if AA* = I, which is equivalent to 
the rows of A being orthonormal. Similarly, A is unitary if and only if 
A*A = I, which is equivalent to the columns of A being orthonormal. As for 
part 2), 


AA* =I = det(A)det(A*)=1 = det(A)det(A) =1 


from which the result follows.O 


242 Advanced Linear Algebra 


Unitary/orthogonal matrices play the role of change of basis matrices when we 
restrict attention to orthonormal bases. Let us first note that if B = (w1,..., Un) 
is an ordered orthonormal basis and 


vV = QU + + + anun 
w = biui +--+ + bnun 


then 
(v, w) = aibi + -+ + anbn = [v]g- [w]g 
where the right hand side is the standard inner product in F” and so v L w if 


and only if [v]g L [w]g. We can now state the analog of Theorem 2.9. 


Theorem 10.14 /f we are given any two of the following: 
I) A unitary/orthogonal n x n matrix A, 

2) An ordered orthonormal basis B for F”, 

3) An ordered orthonormal basis C for F”, 

then the third is uniquely determined by the equation 


A = Mge 
Proof. Let 6 = {b;} be a basis for V. If C is an orthonormal basis for V, then 
(bi, bi) = [Bile - [bjle 
where [b;]c is the ith column of A = Mg c. Hence, A is unitary if and only if B 
is orthonormal. We leave the rest of the proof to the reader. O 
Unitary Similarity 
We have seen that the change of basis formula for operators is given by 
[le = Pir|eP* 
where P is an invertible matrix. What happens when the bases are orthonormal? 
Definition 


1) Two complex matrices A and B are unitarily similar (also called 
unitarily equivalent) if there exists a unitary matrix U for which 


B = U AU™! =UAU* 


The equivalence classes associated with unitary similarity are called 
unitary similarity classes. 

2) Similarly, two real matrices A and B are orthogonally similar (also called 
orthogonally equivalent) if there exists an orthogonal matrix O for which 


B = OAO™ = OAO' 


The equivalence classes associated with orthogonal similarity are called 
orthogonal similarity classes. O 


Structure Theory for Normal Operators 243 


The analog of Theorem 2.19 is the following. 


Theorem 10.15 Let V be an inner product space of dimension n. Then two 
n x n matrices A and B are unitarily/orthogonally similar if and only if they 
represent the same linear operator T E€ L(V ) with respect to (possibly different) 
ordered orthonormal bases. In this case, A and B represent exactly the same 
set of linear operators in L(V ) with respect to ordered orthonormal bases. 
Proof. If A and B represent r € L(V), that is, if 


A=l[rlg and B= f[r]e 
for ordered orthonormal bases 6 and C, then 
B= MpcAMcp 
and according to Theorem 10.14, Mg is unitary/orthogonal. Hence, A and B 
are unitarily/orthogonally similar. 
Now suppose that A and B are unitarily/orthogonally similar, say 
B=UAU 


where U is unitary/orthogonal. Suppose also that A represents a linear operator 
T € L(V) for some ordered orthonormal basis B, that is, 


A= [rls 


Theorem 10.14 implies that there is a unique ordered orthonormal basis C for V 
for which U = Mgc. Hence 


B= Mee(T]aMge = [Tle 


and so B also represents T. By symmetry, we see that A and B represent the 
same set of linear operators, under all possible ordered orthonormal bases. O 


We have shown (see the discussion of Schur's theorem) that any complex matrix 
A is unitarily similar to an upper triangular matrix, that is, that A is unitarily 
upper triangularizable. However, upper triangular matrices do not form a set of 
canonical forms under unitary similarity. Indeed, the subject of canonical forms 
for unitary similarity is rather complicated and we will not discuss it in this 
book, but instead refer the reader to the survey article [28]. 


Reflections 


The following defines a very special type of unitary operator. 


244 Advanced Linear Algebra 


Definition For a nonzero v € V, the unique operator H, for which 
Hw = —v, H,w = w for all w € (v)* 


is called a reflection or a Householder transformation. O 


It is easy to verify that 


2 
Hyt =z- S 
(v, v) 
Moreover, Hx = —2x for x # 0 if and only if x = av for some a € F and so 


we can uniquely identify v by the behavior of the reflection on V. 


If H, is a reflection and if we extend v to an ordered orthonormal basis 6 for V, 
then [H,]g is the matrix obtained from the identity matrix by replacing the upper 
left entry by —1, 


—1 
[Hv] = 
1 
Thus, a reflection is both unitary and Hermitian, that is, 
H =H,’ =H, 


Given two nonzero vectors of equal length, there is precisely one reflection that 
interchanges these vectors. 


Theorem 10.16 Let v, w E€ V be distinct nonzero vectors of equal length. Then 
Aly is the unique reflection sending v to w and w to v. 
Proof. If ||v|| = ||w]|, then (v — w) L (v + w) and so 


Ay_w(v -—w) =w- v 
H, wulu +w) =v+w 


from which it follows that H,_.,(v) = w and H,_,,(w) = v. As to uniqueness, 
suppose H, is a reflection for which H,(v) = w. Since H;' = H,, we have 
H,(w) = v and so 


H,(v — w) = —(v — w) 
which implies that H, = Hy-w.O 


Reflections can be used to characterize unitary operators. 


Theorem 10.17 Let V be a finite-dimensional inner product space. The 
following are equivalent for an operator T € L(V): 


Structure Theory for Normal Operators 245 


1) 7 is unitary/orthogonal 

2) Tis a product of reflections. 

Proof. Since reflections are unitary/orthogonal and the product of unitary/ 
orthogonal operators is unitary, it follows that 2) implies 1). For the converse, 
let 7 be unitary. Let B = (w1,..., Un) be an orthonormal basis for V. Then 


Hann (Tai) = u] 
and so if x; = Tu, — uy then 
(Ha T)u = u1 


that is, 7; := H,,7 is the identity on (u1). Suppose that we have found 
reflections Hy, ,,...,H,, for which Tk- = H,,_,---H,,7 is the identity on 
(ui, ..., Up—1). Then 


ki 


Ay. uj (THUR) = Uk 
Moreover, we claim that (7,-1u% — uk) L u; fori < k, since 


(Th-1Uk — Up, Ui) = (Hr, He T)Uk, Ui) 
= (Tuk, Hr Hr, Ui) 


= (TUR, TU;) 
= (up, ui) 
=0 


Hence, if x, = Tk-1Uk — Ug, then 
(Hap Hy, T)ui = Hru; = ui 


and so Tk := Ha, -HaT is the identity on (u1,..., ug). Thus, for k =n we 
have Hz,- Ha, T =v and so 7 = H,,---H,,, as desired. 


The Structure of Normal Operators 


The following theorem includes the spectral theorems stated above for real and 
complex normal operators, along with some further refinements related to self- 
adjoint and unitary/orthogonal operators. 


Theorem 10.18 (The structure theorem for normal operators) 
1) (Complex case) Let V be a finite-dimensional complex inner product 
space. 
a) The following are equivalent for r € L(V): 
i) 7 is normal 
ii) T is unitarily diagonalizable 
iii) T has an orthogonal spectral resolution 


T=Apit-::+Axpr 


246 


2) 


b) 


c) 


Advanced Linear Algebra 


Among the normal operators, the Hermitian operators are precisely 
those for which all complex eigenvalues are real. 

Among the normal operators, the unitary operators are precisely those 
for which all complex eigenvalues have norm 1. 


(Real case) Let V be a finite-dimensional real inner product space. 


a) 


b) 


T € L(V) is normal if and only if 
V=&,0::08,0W,0::-OWn, 


where {Ai,...,Ax} is the spectrum of T and each W; is a two- 
dimensional indecomposable T-invariant subspace with an ordered 


basis B; for which 
_ | K | 
T = 
[r]s; E h 


Among the real normal operators, the symmetric operators are those 
for which there are no subspaces W; in the decomposition of part 2a). 
Hence, the following are equivalent for T € L(V): 

i) T is symmetric. 

ii) 7 is orthogonally diagonalizable. 

iii) T has the orthogonal spectral resolution 


T = Mpi +e + Àkpk 


Among the real normal operators, the orthogonal operators are 
precisely those for which the eigenvalues are equal to +1 and the 
matrices [T|p, described in part 2a) have rows (and columns) of norm 


1, that is, 
rls, = sinô —cosé 
5 |\cos@ sind 


for some € R. 


Proof. We have proved part la). As to part 1b), it is only necessary to look at a 
diagonal matrix A representing T. This matrix has the eigenvalues of 7 on its 
main diagonal and so it is Hermitian if and only if the eigenvalues of 7 are real. 
Similarly, A is unitary if and only if the eigenvalues of 7 have absolute value 
equal to 1. 


We have proved part 2a). Parts 2b) and 2c) follow by looking at the matrix 
A = [r]g where B = |JB;. This matrix is symmetric if and only if A is diagonal, 
and A is orthogonal if and only if A; =+1 and the matrices [r]g, have 
orthonormal rows. O 


Matrix Versions 


We can formulate matrix versions of the structure theorem for normal operators. 


Structure Theory for Normal Operators 247 


Theorem 10.19 (The structure theorem for normal matrices) 
1) (Complex case) 
a) A complex matrix A is normal if and only if it is unitarily 
diagonalizable, that is, if and only if there is a unitary matrix U for 
which 


U AU* = diag(Ay,..., Ax) 


b) A complex matrix A is Hermitian if and only if la) holds, where all 
eigenvalues A; are real. 
c) A complex matrix A is unitary if and only if la) holds, where all 
eigenvalues A; have norm 1. 
2) (Real case) 
a) A real matrix A is normal if and only if there is an orthogonal matrix 
O for which 


t oa: ai =b; Am —bm 
OAO = diag(A1,-0..Ar k n los i i }) 


b) A real matrix A is symmetric if and only if it is orthogonally 
diagonalizable, that is, if and only if there is an orthogonal matrix O 
for which 


OAO! = diag(A1,---, Ax) 


c) A real matrix A is orthogonal if and only if there is an orthogonal 
matrix O for which 


OAO' 


= diag (An -Me her hae bes areal 


cos, sini cos ôm sin, 
for some 0;,..., 0m E RO 


Functional Calculus 


Let T be a normal operator on a finite-dimensional inner product space V and let 
T have spectral resolution 


T = A1p1 + +Anpr 


Since each p; is idempotent, we have p;” 


orthogonality of the projections implies that 


= p; for all m > 1. The pairwise 


T” = (Api +e Akpk)” = MP1 ++ AL Pk 
More generally, for any polynomial p(x) over F, 
p(T) = p(à1)p1 + + PAK) pk 


Note that a polynomial of degree k — 1 is uniquely determined by specifying an 


248 Advanced Linear Algebra 


arbitrary set of k of its values at the distinct points a ,...,a,. This follows from 
the Lagrange interpolation formula 


Therefore, we can define a unique polynomial p(x) by specifying the values 
p(dA;), for i = 1,...,k. 
For example, for a given 1 < j < k, if p;(x) is a polynomial for which 
pj(Ai) = 64,3 

for i = 1,...,k, then 

p(T) = Pj 
and so each projection p; is a polynomial function of r. As another example, if 
T is invertible and p(A;) = A;", then 


p(t) = Ap t +Ag PR =T 


as can easily be verified by direct calculation. Finally, if p(A;) = ;, then since 
each p; is self-adjoint, we have 


P(T) = Aipi +++ + Agpe = T 


and so T* is a polynomial in 7. 


We can extend this idea further by defining, for any function 
f:{A1,--- An} EF 
the linear operator f(T) by 
f(r) = fOr ++ + fn) be 


For example, we may define VT, T1, e7 and so on. Notice, however, that 
since the spectral resolution of 7 is a finite sum, we gain nothing (but 
convenience) by using functions other than polynomials, for we can always find 
a polynomial p(x) for which p(d;)= f(A;) for i=1,...,k and so 
f(r) = p(T). The study of the properties of functions of an operator 7 is 
referred to as the functional calculus of 7. 


According to the spectral theorem, if V is complex and 7 is normal, then f(T) is 
a normal operator whose eigenvalues are f(A;). Similarly, if V is real and 7 is 
symmetric, then f(T) is symmetric, with eigenvalues f(A;). 


Structure Theory for Normal Operators 249 


Commutativity 


The functional calculus can be applied to the study of the commutativity 
properties of operators. Here are two simple examples. 


Theorem 10.20 Let V be a finite-dimensional complex inner product space. 
For T,0 € L(V), we write T + o to denote the fact that T and o commute. Let 
T and o have spectral resolutions 


T = Api t+ + Anpr 
G = MYT a HmYm 


Then 
I) Forany p€ L(V), 


bor & wep; foralli 
2) 

TOO & pV; foralli,7 
3) Uf f:{1,..., An} > F and g:{4,..., Um} > F are injective functions, 

then 
f(r) eg) & THO 

Proof. For 1), if 44 > p; for all 2, then u +> 7 and the converse follows from the 
fact that p; is a polynomial in 7. Part 2) is similar. For part 3), r = ø clearly 
implies f(r) g(o). For the converse, let A = {A1,..., Ap}. Since f is 
injective, the inverse function f~t: f(A) >A is well-defined and 


f-l(f(r)) = 7T. Thus, 7 is a function of f(T). Similarly, ø is a function of g(c). 
It follows that f(T) > g(c) implies r > o.O 


Theorem 10.21 Let + and o be normal operators on a_finite-dimensional 
complex inner product space V. Then T and o commute if and only if they have 
the form 


T= p(r(7,0)) 
a =4q(r(7,0)) 


where p(x), q(x) and r(x, y) are polynomials. 
Proof. If 7 and o are polynomials in 0 = r(r, ø), then they clearly commute. 
For the converse, suppose that ro = o7 and let 


T = Api +++ + AÀkpk 
and 
O = LY, Pet HmYm 


be the orthogonal spectral resolutions of 7 and ø. 


250 Advanced Linear Algebra 


Then Theorem 10.20 implies that pv; = v;p;. Hence, 


ro = (Aipi Ste ise Ak Pr)" (pa af ese pitas 
= (Alpi +++ + Appr) (ui +e + UF Um) 


z 5 Ai 5 Pi Vj 
ij 
It follows that for any polynomial r(x, y) in two variables, 


r(T,o) = rer Hj) PiV; 
ij 


So if we choose r(x, y) with the property that a;,; = r(A;, uj) are distinct, then 


r(T,0) = Sai pir; 
a, 


and we can also choose p(x) and q(x) so that p(a;;j) = A; for all j and 
q(ai j) = y; for all i. Then 


P(r(T,¢)) = S plais) oi) = > pw; 


and similarly, g(r(7,0)) = 0.0 
Positive Operators 


One of the most important cases of the functional calculus is f(x) = yT. 
Recall that the quadratic form associated with a linear operator 7 is 


Q,(v) = (Tv, v) 


Definition A self-adjoint linear operator T € L(V) is 
I) positive ifQ,(v) > 0 forallu € V 
2) positive definite ifQ,(v) > 0 for all v 40.0 


Theorem 10.22 A self-adjoint operator T on a finite-dimensional inner product 
Space is 

1) positive if and only if all of its eigenvalues are nonnegative 

2) positive definite if and only if all of its eigenvalues are positive. 

Proof. If Q,(v) > 0 and rv = Xv, then 


0 < (rv, v) = Alv, v) 


Structure Theory for Normal Operators 251 


and so A > 0. Conversely, if all eigenvalues of 7 are nonnegative, then 
T = Api t+: +Anpr, A = 0 


and since 1 = pj +++: + Pk, 


(rv, v) = Do di(piv, pyr) = X Aille? > 0 
ij i 


and so T is positive. Part 2) is proved similarly.O 


If 7 is a positive operator, with spectral resolution 
T = Mpi tes +Anpr, A = 0 


then we may take the positive square root of 7, 
Vt = Vapi te + V Akp 


where ,/ A; is the nonnegative square root of A;. It is clear that 


(VP? =r 


and it is not hard to see that JT is the only positive operator whose square is T. 
In other words, every positive operator has a unique positive square root. 
Conversely, if 7 has a positive square root, that is, if T = 0”, for some positive 
operator g, then 7 is positive. Hence, an operator T is positive if and only if it 
has a positive square root. 


If 7 is positive, then JT is self-adjoint and so 


(VIT =? 


Conversely, if 7 = o*o for some operator o, then 7 is positive, since it is clearly 
self-adjoint and 

(rv, v) = (o*ov, v) = (av, ov) > 0 
Thus, 7 is positive if and only if it has the form 7 = o*a for some operator ø. 


(A complex number z is nonnegative if and only if has the form z = ww for 
some complex number w.) 


Theorem 10.23 Let r € L(V). 
1) Tis positive if and only if it has a positive square root. 
2) Tis positive if and only if it has the form T = o*o for some operator o.O 


Here is an application of square roots. 


Theorem 10.24 If T and o are positive operators and To = oT, then To is 
positive. 


252 Advanced Linear Algebra 


Proof. Since 7 is a positive operator, it has a positive square root iT, which is 


a polynomial in 7. A similar statement holds for ø. Therefore, since 7 and o 
commute, so do JT and Jo. Hence, 


(Jia) = (VTPT? = 70 


Since JT and Jo are self-adjoint and commute, their product is self-adjoint 
and so To is positive. 


The Polar Decomposition of an Operator 


It is well known that any nonzero complex number z can be written in the polar 
form z = re’, where r is a positive number and @ is real. We can do the same 
for any nonzero linear operator 7 on a finite-dimensional complex inner product 
space. 


Theorem 10.25 Let r be a nonzero linear operator on a _finite-dimensional 

complex inner product space V. 

1) There exist a positive operator p and a unitary operator v for which 
T = vp. Moreover, p is unique and if T is invertible, then v is also unique. 

2) Similarly, there exist a positive operator o and a unitary operator p for 
which T = op. Moreover, o is unique and if T is invertible, then u is also 
unique. 

Proof. Let us suppose for a moment that 7 = vp. Then 


* Hook —1 


r* = (vp)" = p*v* = pv 


and so 
Tr = pv ‘vp =p" 
Also, if v € V, then 
Tv =v (pv) 


These equations give us a clue as to how to define p and v. 


Let us define p to be the unique positive square root of the positive operator 
T*7. Then 


lov? = (ov, pu) = (pv, 0) = (ruo) = ro? (10.2) 
Define v on im(p) by 
v(pv) = Tv 


for all v € V. Equation (10.2) shows that px = py implies that Tz = Ty and so 
this definition of v on im(p) is well-defined. 


Moreover, v is an isometry on im(p), since (10.2) gives 


Structure Theory for Normal Operators 253 


Ilv(o~) || = Irall = llovll 


Thus, if B = {bj,...,b,} is an orthonormal basis for im(p), then 
VB = {vb,,...,b,} is an orthonormal basis for v(im(p)) = im(7). Finally, we 
may extend both orthonormal bases to orthonormal bases for V and then extend 
the definition of v to an isometry on V for which 7 = vp. 


As for the uniqueness, we have seen that p must satisfy p? = r*r and since p? 
has a unique positive square root, we deduce that p is uniquely defined. Finally, 
if r is invertible, then so is p since ker(p) C ker(r). Hence, v = Tp is 
uniquely determined by 7. 


Part 2) can be proved by applying the previous theorem to the map 7%, to get 


T= (T) = (vp)* = pv = pu 
where p is unitary. O 


We leave it as an exercise to show that any unitary operator y has the form 
u = e, where o is a self-adjoint operator. This gives the following corollary. 


Corollary 10.26 (Polar decomposition) Let T be a nonzero linear operator on 
a finite-dimensional complex inner product space. Then there is a positive 
operator p and a self-adjoint operator o for which T has the polar 
decomposition 

T = pe” 
Moreover, p is unique and if T is invertible, then o is also unique. O 


Normal operators can be characterized using the polar decomposition. 


Theorem 10.27 Let T= pe be a polar decomposition of a nonzero linear 
operator T. Then T is normal if and only if po = op. 
Proof. Since 
TT* = pee p = p? 
and 
TT = e” ppe" = e pre? 
we see that 7 is normal if and only if 


ope = p 


254 Advanced Linear Algebra 


or equivalently, 
pe” = e p? 


Now, p is a polynomial in p? and ø is a polynomial in e’” and so this holds if 
and only if po = op.O 


Exercises 


1. Lett € L(U,V). If 7 is surjective, find a formula for the right inverse of T 
in terms of 7*. If 7 is injective, find a formula for a left inverse of 7 in terms 
of 7*. Hint: Consider 77* and T*T. 

2. Let 7 € L(V) where V is a complex vector space and let 


1 1 
Ti = =(7 +7") and n = —(T-T*) 
2 2i 


Show that 7, and 7» are self-adjoint and that 
T= Tn +%im and T* = 7 — itm 


What can you say about the uniqueness of these representations of 7 and 
7? 

3. Prove that all of the roots of the characteristic polynomial of a skew- 
Hermitian matrix are pure imaginary. 

4. Give an example of a normal operator that is neither self-adjoint nor 
unitary. 

5. Prove that if ||rv|| = ||7*(v)|| for all v € V, where V is complex, then 7 is 
normal. 

6. Let 7 be a normal operator on a complex finite-dimensional inner product 
space V or a self-adjoint operator on a real finite-dimensional inner product 
space. 

a) Show that r* = p(T), for some polynomial p(x) € C[z]. 
b) Show that for any o € L(V), or = To implies o7* = 7*o. In other 
words, 7* commutes with all operators that commute with 7. 

7. Show that a linear operator 7 on a finite-dimensional complex inner product 
space V is normal if and only if whenever S is an invariant subspace under 
T, so is SŁ. 

8. Let V be a finite-dimensional inner product space and let r be a normal 
operator on V. 

a) Prove that if 7 is idempotent, then it is also self-adjoint. 
b) Prove that if 7 is nilpotent, then 7 = 0. 
c) Prove that if 7? = 7°, then 7 is idempotent. 

9. Show that if 7 is a normal operator on a finite-dimensional complex inner 
product space, then the algebraic multiplicity is equal to the geometric 
multiplicity for all eigenvalues of 7. 

10. Show that two orthogonal projections o and p are orthogonal to each other 
if and only ifim(c) L im(p). 


11. 


12. 


13. 


14. 


15. 
16. 


17. 


18. 


19. 


20. 


21. 


Structure Theory for Normal Operators 255 


Let r be a normal operator and let o be any operator on V. If the 
eigenspaces of 7 are o-invariant, show that 7 and o commute. 

Prove that if 7 and o are normal operators on a finite-dimensional complex 
inner product space and if 70 = 0o for some operator 0 then 7*0 = @o*. 
Prove that if two normal n x n complex matrices are similar, then they are 
unitarily similar, that is, similar via a unitary matrix. 

If v is a unitary operator on a complex inner product space, show that there 
exists a self-adjoint operator o for which v = e’’. 

Show that a positive operator has a unique positive square root. 

Prove that if r has a square root, that is, if r= 07, for some positive 
operator g, then 7 is positive. 

Prove that if o < r (that is, 7 — ø is positive) and if 0 is a positive operator 
that commutes with both o and 7 then of < 70. 

Using the QR factorization, prove the following result, known as the 
Cholsky decomposition. An invertible linear operator r € L(V) is positive 
if and only if it has the form 7 = p*p where p is upper triangularizable. 
Moreover, p can be chosen with positive eigenvalues, in which case the 
factorization is unique. 

Does every self-adjoint operator on a finite-dimensional real inner product 
space have a square root? 

Let 7 be a linear operator on C” and let A1,..., An be the eigenvalues of 7, 
each one written a number of times equal to its algebraic multiplicity. Show 
that 


Sola? < te(r*7) 


7 


where tr is the trace. Show also that equality holds if and only if 7 is 
normal. 

If r € L(V) where V is a real inner product space, show that the Hilbert 
space adjoint satisfies (r*)© = (r°)*. 


Part II—Topics 


Chapter 11 
Metric Vector Spaces: The Theory of 
Bilinear Forms 


In this chapter, we study vector spaces over arbitrary fields that have a bilinear 
form defined on them. 


Unless otherwise mentioned, all vector spaces are assumed to be finite- 
dimensional. The symbol F denotes an arbitrary field and F, denotes a finite 
field of size q. 


Symmetric, Skew-Symmetric and Alternate Forms 
We begin with the basic definition. 
Definition Let V be a vector space over F. A mapping (,):V x V —> F is 
called a bilinear form if it is linear in each coordinate, that is, if 
(ax + By, z) = a(x, z) + Bly, 2) 
and 
(z ax + By) = a(z, z) + Blz, y) 


A bilinear form is 
1) symmetric if 


(x,y) = (y, £) 


forall x, y € V. 
2) skew-symmetric (or antisymmetric) if 


(x,y) = —{y, x) 
forallx,y E€ V. 


260 Advanced Linear Algebra 


3) alternate (or alternating) if 
(x, x) =0 


forallx € V. 
A bilinear form that is either symmetric, skew-symmetric, or alternate is 
referred to as an inner product and a pair (V,(,)), where V is a vector space 
and (,) is an inner product on V, is called a metric vector space or inner 
product space. As usual, we will refer to V as a metric vector space when the 
Jorm is understood. 
4) A metric vector space V with a symmetric form is called an orthogonal 
geometry over F. 
5) A metric vector space V with an alternate form is called a symplectic 
geometry over F.O 


The term symplectic, from the Greek for “intertwined,” was introduced in 1939 
by the famous mathematician Hermann Weyl in his book The Classical Groups, 
as a substitute for the term complex. According to the dictionary, symplectic 
means “relating to or being an intergrowth of two different minerals.” An 
example is ophicalcite, which is marble spotted with green serpentine. 


Example 11.1 Minkowski space M; is the four-dimensional real orthogonal 
geometry R* with inner product defined by 


(€1, €1) = (€2, €2) = (e3,e3) = 1 


(e4,e4) = —1 
(Eiré) = 0 fort F j 
where e1, ... , e4 is the standard basis for R*.0 


As is traditional, when the inner product is understood, we will use the phrase 
“let V be a metric vector space.” 


The real inner products discussed in Chapter 9 are inner products in the present 
sense and have the additional property of being positive definite—a notion that 
does not even make sense if the base field is not ordered. Thus, a real inner 
product space is an orthogonal geometry. On the other hand, the complex inner 
products of Chapter 9, being sesquilinear, are not inner products in the present 
sense. For this reason, we use the term metric vector space in this chapter, rather 
than inner product space. 


If S is a vector subspace of a metric vector space V, then S inherits the metric 
structure from V. With this structure, we refer to S' as a subspace of V. 


The concepts of being symmetric, skew-symmetric and alternate are not 
independent. However, their relationship depends on the characteristic of the 
base field F, as do many other properties of metric vector spaces. In fact, the 


Metric Vector Spaces: The Theory of Bilinear Forms 261 


next theorem tells us that we do not need to consider skew-symmetric forms per 
se, since skew-symmetry is always equivalent to either symmetry or 
alternateness. 


Theorem 11.1 Let V be a vector space over a field F. 
1) Ifchar(F) = 2, then 
alternate = symmetric <> skew-symmetric 
2) Ifchar(F) 4 2, then 
alternate 4> skew-symmetric 


Also, the only form that is both alternate and symmetric is the zero form: 
(x,y) = 0 forall x,y E€ V. 
Proof. First note that for an alternating form over any base field, 


0 = (x +y, £ +y) = (x,y) + (y, x) 
and so 
(x,y) = — (y, £) 
which shows that the form is skew-symmetric. Thus, alternate always implies 


skew-symmetric. 


If char(F) = 2, then —1 = 1 and so the definitions of symmetric and skew- 
symmetric are equivalent, which proves 1). If char(F’) 4 2 and the form is 
skew-symmetric, then for any x € V, we have (a, 7) = — (x, x) or 2(x, £} = 0, 
which implies that (x, x) = 0. Hence, the form is alternate. Finally, if the form 
is alternate and symmetric, then it is also skew-symmetric and so 
(u,v) = —(u, v) for all u,v € V, that is, (u,v) = 0 for all u, v € V.O 


Example 11.2 The standard inner product on V (n, q), defined by 
(a1, tee Zn) s (yı, e.. Yn) = T1yı perep TEnYn 


is symmetric, but not alternate, since 


(1,0,...,0)-(1,0,...,0) =140 o 


The Matrix of a Bilinear Form 


If B = (bi,...,0,) is an ordered basis for a metric vector space V, then a 
bilinear form is completely determined by the n x n matrix of values 


Mg = (aij) = ((bi, b;)) 


This is referred to as the matrix of the form (or the matrix of V) with respect to 
the ordered basis B. Moreover, any n x n matrix over F is the matrix of some 
bilinear form on V. 


262 Advanced Linear Algebra 


Note that if z = Sir,b; then 


(bi, g) 
Mp[z|s = : 
(bn, £) 
and 
It follows that if y = >> s;b;, then 
S1 
[z] Mgly]s = ( (x,b1) = (£, bn)) 
Sn 


and this uniquely defines the matrix Mg, that is, if [z] A[y]g = (x,y) for all 


x,y € V, then A = Mg. 


A matrix is alternate if it is skew-symmetric and has 0's on the main diagonal. 
Thus, we can say that a form is symmetric (skew-symmetric, alternate) if and 
only if the matrix Mg is symmetric (skew-symmetric, alternate). 


Now let us see how the matrix of a form behaves with respect to a change of 
basis. Let C = (ci,..., Cn) be an ordered basis for V. Recall from Chapter 2 that 
the change of basis matrix Me g, whose ith column is [c;]g, satisfies 


[ule = Meglvlc 


Hence, 
(x,y) = [z]s Malyls 
= ([x]e Még )Mg(Mesglyle ) 
= [x]e( Még MpMc.)lylc 
and so 


Me = Mig MMe 


This prompts the following definition. 


Definition Two matrices A,B E€ M, (F) are congruent if there exists an 


invertible matrix P for which 


A= PBP 


The equivalence classes under congruence are called congruence classes. O 


Metric Vector Spaces: The Theory of Bilinear Forms 263 


Thus, if two matrices represent the same bilinear form on V, they must be 
congruent. Conversely, if B = Mg represents a bilinear form on V and 


A = BP 
where P is invertible, then there is an ordered basis C for V for which 
P = Meg 
and so 
A= Mé g MpMcp 


Thus, A = Me represents the same form with respect to C. 


Theorem 11.2 Let B = (b,,...,b,) be an ordered basis for an inner product 
space V, with matrix 
Mg = ((bi, b;)) 
1) The form can be recovered from the matrix by the formula 
(x,y) = [z]s Malyle 
2) IfC =(c,...,Cn) is also an ordered basis for V, then 
Mc = Mé g MpMcp 
where Mcg is the change of basis matrix from C to B. 
3) Two matrices A and B represent the same bilinear form on a vector space 


V if and only if they are congruent, in which case they represent the same 
set of bilinear forms on V.O 


In view of the fact that congruent matrices have the same rank, we may define 
the rank of a bilinear form (or of V) to be the rank of any matrix that represents 
that form. 


The Discriminant of a Form 
If A and B are congruent matrices, then 
det(A) = det(P’ BP) = det(P)det(B) 


and so det(A) and det(B) differ by a square factor. The discriminant A of a 
bilinear form is the set of determinants of all of the matrices that represent the 
form. Thus, if 6 is an ordered basis for V, then 


A = F°det(Mg) = {r?det(Mg) |0 £r € F} 


264 Advanced Linear Algebra 


Quadratic Forms 
There is a close link between symmetric bilinear forms on V and quadratic 


forms on V. 


Definition A quadratic form on a vector space V is a map Q: V — F with the 
following properties: 
1) Forallre F,vev, 


Q(rv) = °Q(v) 
2) The map 
(u, v)q = Q(u + v) — Q(u) — Q(v) 
is a (symmetric) bilinear form. O 
Thus, every quadratic form Q on V defines a symmetric bilinear form (u, v)o 


on V. Conversely, if char(F) 4 2 and if (,) is a symmetric bilinear form on V, 
then the function 


1 


Qla) = 5(2,2) 


is a quadratic form Q. Moreover, the bilinear form associated with Q is the 
original bilinear form: 


(u,r)q = Q(u + v) — Q(u) - Ql) 
1 1 1 
= aM +, "t v) — 5 usu) z v) 
= zv) + z u) = (u,v) 


Thus, the maps (,) — Q and Q — (, )g are inverses and so there is a one-to-one 
correspondence between symmetric bilinear forms on V and quadratic forms on 
V. Put another way, knowing the quadratic form is equivalent to knowing the 
corresponding bilinear form. 


Again assuming that char( F) Æ 2, if B = (v1,..., Un) is an ordered basis for an 
orthogonal geometry V and if the matrix of the symmetric form on V is 
Mg = (aij) then for z = X£iV;, 

Q(z) = 


(2,2) = $lel Melele = S Sai; 


1 1 
2 2 ij 
and so Q(x) is a homogeneous polynomial of degree 2 in the coordinates 2;. 
(The term “form” means homogeneous polynomial—hence the term quadratic 


form.) 


Metric Vector Spaces: The Theory of Bilinear Forms 265 


Orthogonality 


As we will see, not all metric vector spaces behave as nicely as real inner 
product spaces and this necessitates the introduction of a new set of terminology 
to cover various types of behavior. (The base field F is the culprit, of course.) 
The most striking differences stem from the possibility that (x, x} = 0 for a 
nonzero vector x € V. 


The following terminology should be familiar. 


Definition Let V be a metric vector space. A vector x is orthogonal to a vector 
y, written x L y, if (x,y) = 0. A vector x € V is orthogonal to a subset S of 
V, written x L S, if (x, s) = 0 for all s € S. A subset S of V is orthogonal to a 
subset T of V, written S LT, if (s,t)=0 for all s€ S and t€T. The 
orthogonal complement X+ ofa subset X of V is the subspace 


Xt={veV|v1 X} = 


Note that regardless of whether the form is symmetric or alternate (and hence 
skew-symmetric), orthogonality is a symmetric relation, that is, « L y implies 
y L x. Indeed, this is precisely why we restrict attention to these two types of 
bilinear forms. 


There are two types of degenerate behaviors that a vector may possess: It may 
be orthogonal to itself or, worse yet, it may be orthogonal to every vector in V. 
With respect to the former, we have the following terminology. 


Definition Let V be a metric vector space. 

I) A nonzero « € V is isotropic (or null) if (a,x) =0; otherwise it is 
nonisotropic. 

2) V is isotropic if it contains at least one isotropic vector. Otherwise, V is 
nonisotropic (or anisotropic). 

3) V is totally isotropic (that is, symplectic) if all vectors in V are 
isotropic. O 


Note that if v is an isotropic vector, then so is av for all a € F. This can be 
expressed by saying that the set J of isotropic vectors in V is a cone in V. (A 


cone in V is a nonempty subset that is closed under scalar multiplication.) 


With respect to the more severe forms of degeneracy, we have the following 
terminology. 


Definition Let V be a metric vector space. 


266 Advanced Linear Algebra 


1) A vector v€V is degenerate if v LV. The set V+ of all degenerate 
vectors is called the radical of V and denoted by rad(V ). Thus, 


rad(V) = V+ 
2) V is nonsingular, or nondegenerate, ifrad(V) = {0}. 


3) V is singular, or degenerate, ifrad(V) Æ {0}. 
4) V is totally singular, or totally degenerate, ifrad(V) = V.O 


Some of the above terminology is not entirely standard, so care should be 
exercised in reading the literature. 


Theorem 11.3 A metric vector space V is nonsingular if and only if all 
representing matrices Mg are nonsingular. O 


A note of caution is in order. If S is a subspace of a metric vector space V, then 
rad( S) denotes the set of vectors in S that are degenerate in S, that is, rad( S) is 
the radical of S, as a metric vector space in its own right. However, S+ denotes 
the set of all vectors in V that are orthogonal to S. Thus, 


rad(S) = S N S+ 


Note also that 


rad(S) = S N S+ C S+ A S+ = rad( S+) 


and so if S is singular, then so is S~. 


Example 11.3 Recall that V(n,q) is the set of all ordered n-tuples whose 
components come from the finite field F}. (See Example 11.2.) It is easy to see 
that the subspace 


S = {0000, 1100, 0011, 1111} 


of V (4,2) has the property that S = S+. Note also that V (4,2) is nonsingular 
and yet the subspace S is totally singular. O 


The following result explains why we restrict attention to symmetric or alternate 
forms (which includes skew-symmetric forms). 


Theorem 11.4 Let V be a vector space with a bilinear form. The following are 
equivalent: 
1) Orthogonality is a symmetric relation, that is, 


tly>yle2 


2) The form on V is symmetric or alternate, that is, V is a metric vector 
space. 


Metric Vector Spaces: The Theory of Bilinear Forms 267 


Proof. It is clear that orthogonality is symmetric if the form is symmetric or 
alternate, since in the latter case, the form is also skew-symmetric. 


For the converse, assume that orthogonality is symmetric. For convenience, let 
x X y mean that (x,y) = (y, x) and let x X V mean that (x, v) = (v, x) for all 
v€V.Ifa™ V for all « € V, then V is orthogonal and we are done. So let us 
examine vectors x with the property that x 4 V. 


We wish to show that 
xAV = gisisotropic and (xX y>-a21y) (11.1) 


Note that if the second conclusion holds, then since x M x, it follows that x is 
isotropic. So suppose that x X y. Since x M V, there is a z € V for which 
(a,z) # (z,x) and so x L y if and only if 


(x,y) ((@, z) — (2,@)) = 0 


(x,y) ((a, 2) ~~ (z,Z)) = (£, y) (x, z) _ (x,y 
= (y, x) (x, 2) = (£, y) (z, £) 
— (z, (y, x)z = y(z,X)) 


But reversing the coordinates in the last expression gives 
((y, x)z a y(z, z), x) = (y, x) (z, x) B (y, z) (z, x) =0 
and so the symmetry of orthogonality implies that the last expression is 0 and so 


we have proven (11.1). 


Let us assume that V is not orthogonal and show that all vectors in V are 
isotropic, whence V is symplectic. Since V is not orthogonal, there exist 
u,v € V for which u 4 v and so u X V and v M V. Hence, the vectors u and v 
are isotropic and for all y € V, 


yMu => ylu 


yv => ylu 


Since all vectors w for which w M V are isotropic, let w X V. Then w MX u and 
w M v and so w L u and w L v. Now write 


w=(w-—u)+u 


where w— u Lu, since u is isotropic. Since the sum of two orthogonal 
isotropic vectors is isotropic, it follows that w is isotropic if w — u is isotropic. 
But 


(w + u,v) = (u,v) Æ (v,u) = (w, w + u) 


268 Advanced Linear Algebra 


and so (w +u) M V, which implies that w + u is isotropic. Thus, w is also 
isotropic and so all vectors in V are isotropic. O 


Orthogonal and Symplectic Geometries 


If a metric vector space is both orthogonal and symplectic, then the form is both 
symmetric and skew-symmetric and so 


(u, v) T (v, u) = —(u, v) 
Therefore, when char( F) 4 2, V is orthogonal and symplectic if and only if V 


is totally degenerate. 


However, if char(F’) = 2, then there are orthogonal symplectic geometries that 
are not totally degenerate. For example, let V =span(u,v) be a two- 
dimensional vector space and define a form on V whose matrix is 


0 1 
u=|? 4] 


Since M is both symmetric and alternate, so is the form. 


Linear Functionals 


The Riesz representation theorem says that every linear functional f on a finite- 
dimensional real or complex inner product space V is represented by a Riesz 
vector Rs € V, in the sense that 


fœ) = w, Ry) 


for all v € V. A similar result holds for nonsingular metric vector spaces. 


Let V be a metric vector space over F. Let x € V and define the inner product 
map |- , x£): V — F by 


(- xv = (v, x£) 


This is easily seen to be a linear functional and so we can define a linear map 
T: V — V* by 


TES ( ` ,5) 
The bilinearity of the form ensures that 7 is linear and the kernel of 7 is 
ker(r) = {2 € V | (V, 2) = {0}} = V+ = rad(V) 


Hence, 7 is injective (and therefore an isomorphism) if and only if V is 
nonsingular. 


Theorem 11.5 (The Riesz representation theorem) Let V be a finite- 
dimensional nonsingular metric vector space. The map T: V — V* defined by 


Metric Vector Spaces: The Theory of Bilinear Forms 269 


TH = (-+,2) 


is an isomorphism from V to V*. It follows that for each f € V* there exists a 
unique vector x € V for which 


fu = (v, x) 
forallv € V.0O 


The requirement that V be nonsingular is necessary. As a simple example, if V 
is totally singular, then no nonzero linear functional could possibly be 
represented by an inner product. 


The Riesz representation theorem applies to nonsingular metric vector spaces. 
However, we can also achieve something useful for singular subspaces S of a 
nonsingular metric vector space. The reason is that any linear functional f € S* 
can be extended to a linear functional f on V, where it has a Riesz vector, that 
is, 


fv= (v, R3) = (+, R7)w 
Hence, f also has this form, where its “Riesz vector” is an element of V, but is 


not necessarily in S. 


Theorem 11.6 (The Riesz representation theorem for subspaces) Let S be a 
subspace of a metric vector space V. If either V or S is nonsingular, the linear 
map T:V — S* defined by 

Ta =(-,2)|9 
is surjective and has kernel S+. Hence, for any linear functional f € S*, there 
is a (not necessarily unique) vector x € V for which fs = (s, x) for alls € S. 


Moreover, if S is nonsingular, then x can be taken from S, in which case it is 
unique. O 


Orthogonal Complements and Orthogonal Direct Sums 


Definition A metric vector space V is the orthogonal direct sum of the 
subspaces S and T, written 


V=SOT 
fV =S T andS LTO 
If S is a subspace of a real inner product space, the projection theorem says that 


the orthogonal complement S+ of S is a true vector space complement of S, 
that is, 


V=SoSs- 


270 Advanced Linear Algebra 


However, in general metric vector spaces, an orthogonal complement may not 
be a vector space complement. In fact, Example 11.3 shows that in some cases 
S+ =. In other cases, for example, if v is degenerate, then (v)t = V. 
However, as we will see, the orthogonal complement of S is a vector space 
complement if and only if either the sum is correct, V = S + S+, or the 
intersection is correct, S$ N S+ = {0}. Note that the latter is equivalent to the 
nonsingularity of S. 


Many nice properties of orthogonality in real inner product spaces do carry over 
to nonsingular metric vector spaces. Moreover, the next result shows that the 
restriction to nonsingular spaces is not that severe. 


Theorem 11.7 Let V be a metric vector space. Then 
V=rad(V) OS 


where S is nonsingular and rad(V ) is totally singular. 
Proof. If S is any vector space complement of rad(V), then rad(V) L S and so 


V=rad(V)OS 
Also, S is nonsingular since rad( S) C rad(V).0 


Here are some properties of orthogonality in nonsingular metric vector spaces. 
In particular, if either V or S is nonsingular, then the orthogonal complement of 
S always has the expected dimension, 


dim(S*) = dim(V) — dim(S) 
even if S+ is not well behaved with respect to its intersection with S. 


Theorem 11.8 Let S be a subspace of a finite-dimensional metric vector space 


V. 
1) Ifeither V or S is nonsingular, then 
dim(S) + dim(S+) = dim(V) 


Hence, the following are equivalent: 


a) V=S4+S- 
b) S is nonsingular, that is, S$ S+ = {0} 
c) Vasos 
2) IfV is nonsingular, then 
a) SSS 


b) rad(S) = rad(S+) 

c) S is nonsingular if and only if S+ is nonsingular. 
Proof. For part 1), the map T: V — S* of Theorem 11.6 is surjective and has 
kernel S+. Thus, the rank-plus-nullity theorem implies that 


Metric Vector Spaces: The Theory of Bilinear Forms 271 


dim(S*) + dim( S+) = dim(V) 
However, dim(.S*) = dim(S) and so part 1) follows. For part 2), since 
rad(S) = S N S+ C SH A S+ = rad( S+) 


the nonsingularity of S+ implies the nonsingularity of S. Then part 1) implies 
that 


dim(S) + dim(S+) = dim(V) 
and 
dim(S+) + dim( S+) = dim(V) 
Hence, S++ = S$ and rad(S) = rad(S+).0 
The previous theorem cannot in general be strengthened. Consider the two- 
dimensional metric vector space V = span(u, v) where 
(u,u) = 1, (u,v) = 0, (v, v} = 0 


If S = span(u), then S+ = span(v). Now, S is nonsingular but S+ is singular 
and so 2c) does not hold. Also, rad( S) = {0} and rad(S+) = S+ and so 2b) 
fails. Finally, S++ = V + S and so 2a) fails. 


Isometries 


We now turn to a discussion of structure-preserving maps on metric vector 
spaces. 


Definition Let V and W be metric vector spaces. We use the same notation (,) 
for the bilinear form on each space. A bijective linear map T: V — W is called 
an isometry if 


(Tu, TV) = (u,v) 


for all vectors u and v in V. If an isometry exists from V to W, we say that V 
and W are isometric and write V ~x W. It is evident that the set of all 
isometries from V to V forms a group under composition. 


If V is a nonsingular orthogonal geometry, an isometry of V is called an 
orthogonal transformation. The set O(V) of all orthogonal transformations 
on V is a group under composition, known as the orthogonal group of V. 


If V is a nonsingular symplectic geometry, an isometry of V is called a 
symplectic transformation. The set Sp(V ) of all symplectic transformations on 
V is a group under composition, known as the symplectic group of V.O 


272 Advanced Linear Algebra 


Note that, in contrast to the case of real inner product spaces, we must include 
the requirement that 7 be bijective since this does not follow automatically if V 
is singular. Here are a few of the basic properties of isometries. 


Theorem 11.9 Let r€ L(V,W) be a linear transformation between finite- 

dimensional metric vector spaces V and W. 

1) Let B= {v1,...,Un} be a basis for V. Then T is an isometry if and only if T 
is bijective and 


(Tvi, TV;) = (vi, vj) 


for alli, j. 
2) IfV is orthogonal and char(F’) Æ 2, then T is an isometry if and only if it is 
bijective and 


(Tv, TV) = (u,v) 


forallv eV. 
3) Suppose that r: V ~ W is an isometry and 


V=So0S+t and W=TOT+ 


IfTS =T, thea T(S+) = TH. 
Proof. We prove part 3) only. To see that ($+) = T+, if z € S+ and t E€ T, 
then since T = 7S, we can write t = Ts for some s € S and so 


(7z,t) = (7z,Ts) = (z,s) =0 


whence 7(S+) C T+. But since the dimensions are equal, it follows that 
‘(oye T40 

Hyperbolic Spaces 

A special type of two-dimensional metric vector space plays an important role in 


the structure theory of metric vector spaces. 


Definition Let V be a metric vector space. A hyperbolic pair is a pair of 
vectors u,v € V for which 


Note that (v,u) = 1 if V is orthogonal and (v,u) = —1 if V is symplectic. In 
either case, the subspace H = span(u, v) is called a hyperbolic plane and any 
space of the form 


where each H; is a hyperbolic plane, is called a hyperbolic space. If (u;, vi) is 
a hyperbolic pair for H;, then we refer to the basis 


(u1, U1, si , Uk, Uk) 


Metric Vector Spaces: The Theory of Bilinear Forms 273 


for H as a hyperbolic basis. (In the symplectic case, the usual term is 
symplectic basis. )U 


Note that any hyperbolic space H is nonsingular. 


In the orthogonal case, hyperbolic planes can be characterized by their degree of 
isotropy, so to speak. (In the symplectic case, all spaces are totally isotropic by 
definition.) Indeed, we leave it as an exercise to prove that a two-dimensional 
nonsingular orthogonal geometry V is a hyperbolic plane if and only if V 
contains exactly two one-dimensional totally isotropic (equivalently, totally 
degenerate) subspaces. Put another way, the cone of isotropic vectors is the 
union of two one-dimensional subspaces of V. 


Nonsingular Completions of a Subspace 
Let U be a subspace of a nonsingular metric vector space V. If U is singular, it 


is of interest to find a minimal nonsingular subspace of V containing U. 


Definition Let V be a nonsingular metric vector space and let U be a subspace 
of V. A subspace S of V for which U < S is called an extension of U. A 
nonsingular completion of U is an extension of U that is minimal in the family 
of all nonsingular extensions of U.O 


Theorem 11.10 Let V be a nonsingular finite-dimensional metric vector space 
over F. We assume that char( F) 4 2 when V is orthogonal. 
1) Let S be a subspace of V. If v is isotropic and the orthogonal direct sum 


span(v) © S 
exists, then there is a hyperbolic plane H = span(v, z) for which 
Hos 


exists. In particular, if v is isotropic, then there is a hyperbolic plane 
containing v. 
2) LetU bea subspace of V and let 


U = span(v,..., v) OW 


where W is nonsingular and {v1,..., vgk} are linearly independent in 
rad(U). Then there is a hyperbolic space Hy, = H, ©-::© Hyp with 
hyperbolic basis (v1, 21,---, Uk, Zk) for which 

U = HOW 
is a nonsingular proper extension of U. If {v1,..., Ux} is a basis for 


rad(U), then 
dim(U) = dim(U) + dim(rad(U)) 


274 Advanced Linear Algebra 


and we refer to U as a hyperbolic extension of U. If U is nonsingular, we 
say that U is a hyperbolic extension of itself. 
Proof. For part 1), the nonsingularity of V implies that S++ = 9. Hence, 
v¢S=S++ and so there is an x € S for which (v,x) #0. If V is 
symplectic, then all vectors are isotropic and so we can take z = (1/(v, x))a. If 
V is orthogonal, let z = rv + sx. The conditions defining (v, z) as a hyperbolic 
pair are (since v is isotropic) 


1 = (v, z} = (v, rv + sx) = s(v, x) 
and 
0 = (z, z) = (ru + sa,rv + sx) = 2rs(v, £) + 3° (x, £) = 2r + 3° (x, £) 


Since (v, x) #0, the first of these equations can be solved for s and since 
char( F) Æ 2, the second equation can then be solved for r. Thus, in either case, 
there is a vector z € S+ for which H = span(v, z) C S+ is hyperbolic. Hence, 
S C S++ C H+ and since H is nonsingular, that is, H N H+ = {0}, we have 
HN S = {0} and so H © S exists. 


Part 2) is proved by induction on k. Note first that all of the vectors v; are 
isotropic. If k = 1, then span(v,) © W exists and so part 1) implies that there is 
a hyperbolic plane H = span(v,, z) for which H © W exists. 


Assume that the result is true for independent sets of size less than k > 2. Since 
span(v) © (span(v2,-..,04) © W) 


exists, part 1) implies that there exists a hyperbolic plane H; = span(v, z1) for 
which 


Hı © (span(ve, ey Uk) © W) 


exists. Since vj,..., Uj, are in the radical of span(v2,...,v,) © W, the inductive 
hypothesis implies that there is a hyperbolic space Hy @©-:-© Hx, with 
hyperbolic basis (v2, 22,..., Uk, Zk) for which the orthogonal direct sum 


Hy ©- © HOW 
exists. Hence, Hı © --- © H; © W also exists. O 


We can now prove that the hyperbolic extensions of U are precisely the minimal 
nonsingular extensions of U. 


Theorem 11.11 (Nonsingular extension theorem) Let U be a subspace of a 
nonsingular finite-dimensional metric vector space V. The following are 
equivalent: 

1) T=HOW isa hyperbolic extension of U 

2) T is a minimal nonsingular extension of U 


Metric Vector Spaces: The Theory of Bilinear Forms 275 


3) T is anonsingular extension of U and 
dim(T) = dim(U) + dim(rad(U)) 


Thus, any two nonsingular completions of U are isometric. 

Proof. If U < X < V where X is nonsingular, then we may apply Theorem 
11.10 to U as a subspace of X, to obtain a hyperbolic extension K © W of U 
for which 


UCKOW CX 


Thus, every nonsingular extension of U contains a hyperbolic extension of U. 
Moreover, all hyperbolic extensions of U have the same dimension: 


dim(H © W) = dim(U) + dim(rad(U )) 


and so no hyperbolic extension of U is properly contained in another hyperbolic 
extension of U. This proves that 1)-3) are equivalent. The final statement 
follows from the fact that hyperbolic spaces of the same dimension are 
isometric. O 


Extending Isometries to Nonsingular Completions 


Let V and V’ be isometric nonsingular metric vector spaces and let 
U =rad(U)©W be a subspace of V, with nonsingular completion 
U=HOW. 


If 7:U — TU is an isometry, then it is a simple matter to extend 7 to an 
isometry 7 from U onto a nonsingular completion of TU. To see this, let 
(u1, 21,---, Uk, Zk) be a hyperbolic basis for H. Since (u1,..., ug) is a basis for 
rad(U), it follows that (rTu1,..., Tug) is a basis for rad(TU ). 


Hence, we can hyperbolically extend TU = rad(rW) © TW to get 
TU =H OTW 


where H’ has hyperbolic basis (Tu1, £1,..., TUg, £k). To extend 7, simply set 
Tz, = x; foralli=1,...,k. 


Theorem 11.12 Let V and V’ be isometric nonsingular metric vector spaces 
and let U be a subspace of V, with nonsingular completion U. Any isometry 
T:U — TU can be extended to an isometry from U onto a nonsingular 
completion of TU.O 


The Witt Theorems: A Preview 


There are two important theorems that are quite easy to prove in the case of real 
inner product spaces, but require more work in the case of metric vector spaces 
in general. Let V and V’ be isometric nonsingular metric vector spaces over a 
field F. We assume that char(F’) 4 2 if V is orthogonal. 


276 Advanced Linear Algebra 


The Witt extension theorem says that if S is a subspace of V, then any isometry 
T: 37S CV' 


can be extended to an isometry from V to V’. The Witt cancellation theorem 
says that if 


V=S0S+ and V’=ToOTt 
then 
S xT > St x Tt 


We will prove these theorems in both the orthogonal and symplectic cases a bit 
later in the chapter. For now, we simply want to show that it is easy to prove 
one Witt theorem using the other. 


Suppose that the Witt extension theorem holds and assume that 
V=So0S* wad V'=TOT} 


and S ~ T. Then any isometry 7: S — T can be extended to an isometry 7 from 
V to V’. According to Theorem 11.9, we have 7(S+) = T+ and so S+ x T+. 
Hence, the Witt cancellation theorem holds. 


Conversely, suppose that the Witt cancellation theorem holds and let 
T:S + TS CV’ be an isometry. Since 7 can be extended to a nonsingular 
completion of S, we may assume that S is nonsingular. Then 


V=SoOS" 
Since 7 is an isometry, 7S is also nonsingular and we can write 
V’ =78 © (r8)" 


Since S ~ rS, Witt's cancellation theorem implies that S+ ~ (7S)+. If 
u: S+ — (7S)+ is an isometry, then the map a: V — V’ defined by 


olu +v) = Tu + wv 


for u € S and v € S+ is an isometry that extends r. Hence Witt's extension 
theorem holds. 


The Classification Problem for Metric Vector Spaces 


The classification problem for a class of metric vector spaces (such as the 
orthogonal or symplectic spaces) is the problem of determining when two metric 
vector spaces in the class are isometric. The classification problem is considered 
“solved,” at least in a theoretical sense, by finding a set of canonical forms or a 
complete set of invariants for matrices under congruence. 


Metric Vector Spaces: The Theory of Bilinear Forms 277 


To see why, suppose that r: V — W is an isometry and B = (v1, ..., Un) is an 
ordered basis for V. Then C = (7v1,..., TVn) is an ordered basis for W and 


Mp(V) = ((vi, 09)) = (Tvi, 70j)) = Me(W) 


Thus, the congruence class of matrices representing V is identical to the 
congruence class of matrices representing W. 


Conversely, suppose that V and W are metric vector spaces with the same 
congruence class of representing matrices. Then if B = (v1,..., Un) is an 
ordered basis for V, there is an ordered basis C = (wy ,..., Wn) for W for which 


((vi, ¥j)) = Mp(V) = Me(W) = ((wi, w;)) 


Hence, the map T: V — W defined by Tv; = wi; is an isometry from V to W. 


We have shown that two metric vector spaces are isometric if and only if they 
have the same congruence class of representing matrices. Thus, we can 
determine whether any two metric vector spaces are isometric by representing 
each space with a matrix and determining whether these matrices are congruent, 
using a set of canonical forms or a set of complete invariants. 


Symplectic Geometry 


We now turn to a study of the structure of orthogonal and symplectic geometries 
and their isometries. Since the study of the structure (and the structure itself) of 
symplectic geometries is simpler than that of orthogonal geometries, we begin 
with the symplectic case. The reader who is interested only in the orthogonal 
case may omit this section. 


Throughout this section, let V be a nonsingular symplectic geometry. 
The Classification of Symplectic Geometries 


Among the simplest types of metric vector spaces are those that possess an 
orthogonal basis. However, it is easy to see that a symplectic geometry V has an 
orthogonal basis if and only if it is totally degenerate and so no “interesting” 
symplectic geometries have orthogonal bases. 


Thus, in searching for an orthogonal decomposition of V, we turn to two- 
dimensional subspaces and this puts us in mind of hyperbolic spaces. Let F be 
the family of all hyperbolic subspaces of V, which is nonempty since the zero 
subspace {0} is singular and so has a nonzero hyperbolic extension. Since V is 
finite-dimensional, F has a maximal member H. Since H is nonsingular, if 


H # V, then 
V=HOHt 


where H+ Æ {0}. But then if v € H- is nonzero, there is a hyperbolic extension 


278 Advanced Linear Algebra 


H ©HXH of H containing v, which contradicts the maximality of H. Hence, 
V=H. 


This proves the following structure theorem for symplectic geometries. 


Theorem 11.13 

1) A symplectic geometry has an orthogonal basis if and only if it is totally 
degenerate. 

2) Any nonsingular symplectic geometry V is a hyperbolic space, that is, 


V=H,©H,©::-OH;, 


where each H; is a hyperbolic plane. Thus, there is a hyperbolic basis for 
V, that is, a basis B for which the matrix of the form is 


Yor = -1 0 


In particular, the dimension of V is even. 
3) Any symplectic geometry V has the form 


V =rad(V) OH 


where H is a hyperbolic space and rad(V) is a totally degenerate space. 
The rank of the form is dim(H) and V is uniquely determined up to 
isometry by its rank and its dimension. Put another way, up to isometry, 
there is precisely one symplectic geometry of each rank and dimension. O 


Symplectic forms are represented by alternate matrices, that is, skew-symmetric 
matrices with zero diagonal. Moreover, according to Theorem 11.13, each 
n X n alternate matrix is congruent to a matrix of the form 


_ | Yo 0 
Xk n—2k = | 0 mA 


Since the rank of Xo; n-2x is 2k, no two such matrices are congruent. 


Theorem 11.14 The set of n x n matrices of the form Xəkn-2k is a set of 
canonical forms for alternate matrices under congruence. O 


The previous theorems solve the classification problem for symplectic 
geometries by stating that the rank and dimension of V form a complete set of 


Metric Vector Spaces: The Theory of Bilinear Forms 279 


invariants under congruence and that the set of all matrices of the form Xok 72% 
is a set of canonical forms. 


Witt's Extension and Cancellation Theorems 
We now prove the Witt theorems for symplectic geometries. 
Theorem 11.15 (Witt's extension theorem) Let V and V’ be isometric 
nonsingular symplectic geometries over a field F. Then any isometry 
TS 37S CV’ 


on a subspace S of V can be extended to an isometry from V to V". 

Proof. According to Theorem 11.12, we can extend 7 to a nonsingular 
completion of S, so we may simply assume that S and 7S are nonsingular. 
Hence, 


V=Sos 

and 

V’ =78 © (rS)* 
To complete the extension of 7 to V, we need only choose a hyperbolic basis 

(eis fi, --+5 €p; fp) 
for S+ and a hyperbolic basis 

(eis fis- €p fp) 
for (7.S')+ and define the extension by setting Te; = e; and 7 f; = f/.0 
As a corollary to Witt's extension theorem, we have Witt's cancellation theorem. 
Theorem 11.16 (Witt's cancellation theorem) Let V and V' be isometric 
nonsingular symplectic geometries over a field F. If 

V=sSo0S° and V'=TOT* 

then 


S xT > St x TH o 


The Structure of the Symplectic Group: Symplectic Transvections 


Let us examine the nature of symplectic transformations (isometries) on a 
nonsingular symplectic geometry V. Recall that for a real vector space, an 
isometric isomorphism, which corresponds to an isometry in the present context, 
is the same as an orthogonal map and orthogonal maps are products of 
reflections (Theorem 10.17). Recall also that a reflection H, is defined as an 
operator for which 


280 Advanced Linear Algebra 


Hv = —v, Hyw = w for all w € (v)* 


and that 


In the present context, we do not dare divide by (v,v), since all vectors are 
isotropic. So here is the next-best thing. 


Definition Let V be a nonsingular symplectic geometry over F. Letv € V be 
nonzero and let a € F. The map Toa: V — V defined by 
Tya(t) = £ + a(x, v)u 
is called the symplectic transvection determined by v and a.O 
Note that if a = 0, then 7,, =z and if a Æ 0, then Tsa is the identity precisely 


on the subspace span(v)+ of codimension 1. In the case of a reflection, H, is the 
identity precisely on span(v)+ and 


V = span(v)* © span(v) 


However, for a symplectic transvection, Toa is the identity precisely on 
span(v)+ (for a 4 0) but span(v) C span(v)+. Here are the basic properties of 
symplectic transvections. 


Theorem 11.17 Let T, «a be a symplectic transvection on V. Then 

1) Toa is a symplectic transformation (isometry). 

2) Toa = Lif and only if a = 0. 

3) Ifa Lv, then (x) = x. For a 40, x L v ifand only if To a (£) = z. 
4) Tv,aTv,b = Tu a+b- 

5) 4 = Tai 

6) For any symplectic transformation o, 


—1 
OTua = Tova 
7) ForbeF*, 
Thva = Tu ab? O 


Note that if U is a subspace of V and if Tu a is a symplectic transvection on U, 
then, by definition, u € U. However, the formula 


Tual£) = xz +a(z, uju 


also defines a symplectic transvection on V, where x ranges over V. Moreover, 
for any z € UŁ, we have Tu, az = z and s0 Tu a is the identity on U+. 


Metric Vector Spaces: The Theory of Bilinear Forms 281 


We now wish to prove that any symplectic transformation on a nonsingular 
symplectic geometry V is the product of symplectic transvections. The proof is 
not difficult, but it is a bit lengthy, so we break it up into parts. Our first goal is 
to show that we can get from any hyperbolic pair to any other hyperbolic pair 
using a product of symplectic transvections. 


Let us say that two hyperbolic pairs (x,y) and (w, z) are connected if there is a 
product u of symplectic transvections that carries x to w and y to z and write 

h: (x,y) > (w, 2) 
or (x,y) +> (w, z). It is clear that connectedness is an equivalence relation on 


the set of hyperbolic pairs. 


Theorem 11.18 /n a nonsingular symplectic geometry V, every pair of 
hyperbolic pairs are connected. 
Proof. Note first that if (s, t) # 0, then s 4 t and so 


Ts-taS = S + a(s,s—t)(s—t) =s—a(s,t)(s—t) 


Taking a = 1/(s,t) gives T5148 = t. Therefore, if (s, u) is hyperbolic, then we 
can always find a vector x for which 


(s,u) > (t,x) 
namely, £ = Ts—t au. Also, if both (s, u) and (t, u) are hyperbolic, then 
(s, u) > (t,u) 
since (s — t, u) = 0 and so Ts-tau = u. 
Actually, these statements are still true if (s, t) = 0. For in this case, there is a 
nonzero vector y for which (s, y) Æ 0 and (t, y) # 0. This follows from the fact 


that there is an f € V* for which fs # 0 and ft ¢ 0 and so the Riesz vector Ry 
is such a vector. Therefore, if (s, u) is hyperbolic, then 


(s,u) > (Y, Ts-y,aU) > (t, Ty-t,a! Ts—y,a) 
and if both (s, u) and (t, u) are hyperbolic, then 
(s,u) > (y,u) > (t,u) 
Hence, transitivity gives the same result as in the case (s, t) 4 0. 
Finally, if (u1, u2) and (v1, v2) are hyperbolic, then there is a y for which 
(ui, U2) > (v1, y) = (v1, v2) 


and so transitivity shows that (u1, u2) > (v1, v2).O 


282 Advanced Linear Algebra 


We can now show that the symplectic transvections generate the symplectic 
group. 


Theorem 11.19 Every symplectic transformation on a nonsingular symplectic 
geometry V is the product of symplectic transvections. 

Proof. Let u be a symplectic transformation on V. We proceed by induction on 
d=dim(V). If d= 2, then V = H =span(u,z) is a hyperbolic plane and 
Theorem 11.18 implies that there is a product 7 of symplectic transvections on 
V for which 


r: (u, 2) => (uu, uz) 
This proves the result if d = 2. Assume that the result holds for all dimensions 
less than d and let dim(V) = d. 
Now, 
V=HOK 


where H = span(u, z) and K is a symplectic geometry of dimension less than 
that of V. As before, there is a product 7 of symplectic transvections on V for 
which 


r: (u, z) > (uu, 12) 
and so 
Tle = ulg 


Note that r tuH = H and so Theorem 11.9 implies that 7~'y(H+) = H+. 
Since dim( H+) < dim(H), the inductive hypothesis applied to the symplectic 
transformation ~tu on H+ implies that there is a product 7 of symplectic 
transvections on H+ for which m =7~!y. As remarked earlier, m is also a 
product of symplectic transvections on V that is the identity on H and so 


tl =ty and p= tron Ht 


Thus, u = rr on both H and on H+ and so u = r7 is a product of symplectic 
transvections on V.O 


The Structure of Orthogonal Geometries: Orthogonal Bases 


We have seen that no interesting (that is, not totally degenerate) symplectic 
geometries have orthogonal bases. By contrast, almost all interesting orthogonal 
geometries V have orthogonal bases. 


To understand why, it is convenient to group the orthogonal geometries into two 
classes: those that are also symplectic and those that are not. The reason is that 
all orthogonal nonsymplectic geometries have orthogonal bases, as we will see. 
However, an orthogonal symplectic geometry has an orthogonal basis if and 


Metric Vector Spaces: The Theory of Bilinear Forms 283 


only if it is totally degenerate. Furthermore, we have seen that if char(F’) # 2, 
then all orthogonal symplectic geometries are totally degenerate and so all such 
geometries have orthogonal bases. But if char( F) = 2, then there are orthogonal 
symplectic geometries that are not totally degenerate and therefore do not have 
orthogonal bases. 


Thus, if we exclude orthogonal symplectic geometries when char( F) = 2, we 
can say that every orthogonal geometry has an orthogonal basis. 


If a metric vector space V has an orthogonal basis, the natural next step is to 
look for an orthonormal basis. However, if V is singular, then there is a nonzero 
vector v € V+ and such a vector can never be a linear combination of vectors 
from an orthonormal basis {u1,..., Un}, since the coefficients in such a linear 
combination are (v, ui) = 0. 


However, even if V is nonsingular, orthonormal bases do not always exist and 
the question of how close we can come to such an orthonormal basis depends on 
the nature of the base field. We will examine this issue in three cases: 
algebraically closed fields, the field of real numbers and finite fields. 


We should also mention that even when V has an orthogonal basis, the Gram- 
Schmidt orthogonalization process may not apply to produce such a basis, 
because even nonsingular orthogonal geometries may have isotropic vectors, 
and so division by (u, u) is problematic. 


For example, consider an orthogonal hyperbolic plane H = span(u,v) and 
assume that char(F) Æ 2. Thus, u and v are isotropic and (u,v) =1. The 
vector u cannot be extended to an orthogonal basis using the Gram-Schmidt 
process, since {u, au + bv} is orthogonal if and only if b = 0. However, H does 
have an orthogonal basis, namely, {u + v, u — v}. 


Orthogonal Bases 


Let V be an orthogonal geometry. As we have discussed, if V is also 
symplectic, then V has an orthogonal basis if and only if it is totally degenerate. 
Moreover, when char(F’) 4 2, all orthogonal symplectic geometries are totally 
degenerate and so all orthogonal symplectic geometries have an orthogonal 
basis. 


If V is orthogonal but not symplectic, then V contains a nonisotropic vector u1, 
the subspace span(u) is nonsingular and 


V = span(u) © Vi 
where V, = span(u)~. If V; is not symplectic, then we may decompose it to get 


V = span(u1) © span(u2) © V2 


284 Advanced Linear Algebra 


This process may be continued until we reach a decomposition 

V = span(u1) © --- © span(u;,) © U 
where U is symplectic as well as orthogonal. (This includes the case U = {0}.) 
Let B = (w1,..., ux). 


If char( F) Æ 2, then U is totally degenerate. Thus, if C is a basis for U, then the 
union BUC is an orthogonal basis for V. If char(F)=2, then 
U =H Orad(U), where H is hyperbolic and so 


V = span(u1) © --: © span(u;,) © H © rad(U) 


where rad(U) is totally degenerate and the u; are nonisotropic. If 
C = (41, Y1,---;Lm; Ym) is a hyperbolic basis for H and D = (2,..., Zm) is an 
ordered basis for rad(U ), then the union 


E=BUCUD = (ty 0-5 Uk Li; Vies Bm Ying Slap Sm) 
is an ordered orthogonal basis for V. However, we can do better (in some 


sense). 


The following lemma says that when char(F) = 2, a pair of isotropic basis 
vectors, such as x;, y;, can be replaced by a pair of nonisotropic basis vectors, 
when coupled with a nonisotropic basis vector, such as ux. 


Lemma 11.20 Suppose that char(F) = 2. Let W be a three-dimensional 
orthogonal geometry of the form 
W = span(v) © span(2;, y) 
where v is nonisotropic and H, = span(x, y) is a hyperbolic plane. Then 
W = span(v,) © span(v2) © span(v3) 


where each v; is nonisotropic. 
Proof. It is straightforward to check that if (v, v} = a, then the vectors 


Yy=uUutet+y 
v~2=utaxr 
vz =ut(l—-a)at+y 


are linearly independent and mutually orthogonal. Details are left to the 
reader. O 


Using the previous lemma, we can replace the vectors {up, £1, y1} with the 
nonisotropic vectors {Vk, Vk+1, Vk+2}, While retaining orthogonality. Moreover, 
the replacement process can continue until the isotropic vectors are absorbed, 
leaving an orthogonal basis of nonisotropic vectors. 


Metric Vector Spaces: The Theory of Bilinear Forms 285 


Let us summarize. 


Theorem 11.21 Let V be an orthogonal geometry. 

1) If V is also symplectic, then V has an orthogonal basis if and only if it is 
totally degenerate. When char(F) #2, all orthogonal symplectic 
geometries have an orthogonal basis, but this is not the case when 
char( F) = 2. 

2) If V is not symplectic, then V has an ordered orthogonal basis 
B = (w, ..., Uk, Z1, ---, 2m) for which (u;i, ui) = a; #0 and (zi, zi) = 0. 
Hence, Mg has the diagonal form 


Mg = 


0 


with k = rk( Mg) nonzero entries on the diagonal. 
As a corollary, we get a nice theorem about symmetric matrices. 


Corollary 11.22 Let M be a symmetric matrix and assume that M is not 
alternate if char(F) = 2. Then M is congruent to a diagonal matrix. O 


The Classification of Orthogonal Geometries: Canonical 
Forms 


We now want to consider the question of improving upon Theorem 11.21. The 
diagonal matrices of this theorem do not form a set of canonical forms for 


congruence. In fact, if r1,...,7,% are nonzero scalars, then the matrix of V with 
respect to the basis C = (r11, ... , kUn, Z1; -<+ Zm) ÍS 
r?a 
ra 
Mc = kok (11.2) 
0 
0 


Hence, Mg and Mce are congruent diagonal matrices. Thus, by a simple change 
of basis, we can multiply any diagonal entry by a nonzero square in F. 


The determination of a set of canonical forms for symmetric (nonalternate when 
char(F’) = 2) matrices under congruence depends on the properties of the base 
field. Our plan is to consider three types of base fields: algebraically closed 


286 Advanced Linear Algebra 


fields, the real field R and finite fields. Here is a preview of the forthcoming 
results. 


1) When the base field F is algebraically closed, there is an ordered basis 6 
for which 


Mg = Zkam = 


0 


If V is nonsingular, then Mg is an identity matrix and V has an 
orthonormal basis. 
2) Over the real base field, there is an ordered basis B for which 


1 


Mg = Zp,m,k = 


Mg = Zk mld) = d 


0 


where d is unique up to multiplication by a square and if char( F) = 2, then 
we can take d = 1. 


Now let us turn to the details. 


Algebraically Closed Fields 


If F is algebraically closed, then for every r € F, the polynomial z? — r has a 
root in F, that is, every element of F has a square root in F. Therefore, we may 
choose r; = 1/ Jf di in (11.2), which leads to the following result. 


Metric Vector Spaces: The Theory of Bilinear Forms 287 


Theorem 11.23 Let V be an orthogonal geometry over an algebraically closed 
field F. Provided that V is not symplectic as well when char( F) = 2, then V 
has an ordered orthogonal basis B= (u,...,Uk,21;--+;2m) for which 
(uj, ui) = 1 and (zi, zi) = 0. Hence, Mg has the diagonal form 


Mg = Zkm = 


0 


with k ones and m zeros on the diagonal. In particular, if V is nonsingular, 
then V has an orthonormal basis. O 


The matrix version of Theorem 11.23 follows. 


Theorem 11.24 Let S,, be the set of all n x n symmetric matrices over an 

algebraically closed field F. If char(F) = 2, we restrict S,, to the set of all 

symmetric matrices with at least one nonzero entry on the main diagonal. 

1) Any matrix M in S,, is congruent to a unique matrix of the form Zk m, in 
fact, k = rk(M) and m = n — rk( M). 

2) The set of all matrices of the form Zk m for k +m = n is a set of canonical 
forms for congruence on Sp. 

3) The rank of a matrix is a complete invariant for congruence on S.O 


The Real Field R 
If F = R, we can choose r; = 1/,/|a;|, so that all nonzero diagonal elements in 
(11.2) will be either 0, 1 or —1. 
Theorem 11.25 (Sylvester's law of inertia) Any real orthogonal geometry V 
has an ordered orthogonal basis 

B = (Ut, ... , Up, Ul, +++ Um, Z1, +++» Zk) 


for which (u;i, ui) = 1, (vi, vi) = —1 and (zi, zi) = 0. Hence, the matrix Mg has 
the diagonal form 


288 Advanced Linear Algebra 


Mg = = Zp,m,k = = 


with p ones, m negative ones and k zeros on the diagonal. O 
Here is the matrix version of Theorem 11.25. 


Theorem 11.26 Let Sn be the set of all n x n symmetric matrices over the real 

field R. 

1) Any matrix in S, is congruent to a unique matrix of the form Zpm,k, for 
some p, mandk =n— p-m. 

2) The set of all matrices of the form Zpm k Jor p+m+k =n is a set of 
canonical forms for congruence on Sy. 

3) Let M € Sn and let M be congruent to Zpm,x. Then p+ m is the rank of 
M and p — m is the signature of M and the triple (p,m, k) is the inertia 
of M. The pair (p,m), or equivalently the pair (p+ m,p-— m), is a 
complete invariant under congruence on Sy. 

Proof. We need only prove the uniqueness statement in part 1). Let 


Bo Wises ci Ain irea Vins Ziganxs Zh) 
and 
ae Gh ft Poof 1 
C = (Uy, Up Vis Umia Zh) 


be ordered bases for which the matrices Mg and Me have the form shown in 
Theorem 11.25. Since the rank of these matrices must be equal, we have 
p+m = p +m andso k =k’. 


If x € span(u1,..., up) and x Æ 0, then 
(a, x) = (Yran Y rju) = X rary lus uy) = Sontag = Sori >0 
ij ij 
On the other hand, if y € span(v},...,v/,/) and y Æ 0, then 


v) = (Soot, Ds) = Pssst tv) = -Joasi = -S7s? <0 


Hence, if y € span(v|,..., Uy Z1; -< 2) then (y, y) < 0. It follows that 


aes 


Metric Vector Spaces: The Theory of Bilinear Forms 289 


span(uj,...,Up) Mspan(v;,..., Um Zis -3 Zp) = {0} 
and so 
p+(n-p)<n 


that is, p < p'. By symmetry, p' < p and so p = p'. Finally, since k = k’, it 
follows that m = m’.O 


Finite Fields 


To deal with the case of finite fields, we must know something about the 
distribution of squares in finite fields, as well as the possible values of the 
scalars (v, v). 


Theorem 11.27 Let F} be a finite field with q elements. 

1) Ifchar(F}) = 2, then every element of F} is a square. 

2) If char(F,) #2, then exactly half of the nonzero elements of Fy are 
squares, that is, there are (q — 1)/2 nonzero squares in F,. Moreover, if x 
is any nonsquare in F,, then all nonsquares have the form rx, for some 
ré Fy. 

Proof. Write F = F}, let F* be the subgroup of all nonzero elements in F and 

let 


(F*)? = {a |a € F*} 


be the subgroup of all nonzero squares in F. The Frobenius map 
o: F* — (F*)? defined by ¢(a) = a? is a surjective group homomorphism, with 
kernel 


ker(¢) = {a € F | a? = 1} = {-1,1} 


If char(F’) = 2, then ker(#) = {1} and so ¢ is bijective and |F*| = |(F*)?|, 
which proves part 1). If char( F) Æ 2, then |ker(#)| = 2 and so |F*| = 2|(F*)?|, 
which proves the first part of part 2). We leave proof of the last statement to the 
reader.O 


Definition A bilinear form on V is universal if for any nonzero c € F there 
exists a vector v € V for which (v,v) = ¢.0 


Theorem 11.28 Let V be an orthogonal geometry over a finite field F with 
char(F’) 4 2 and assume that V has a nonsingular subspace of dimension at 
least 2. Then the bilinear form of V is universal. 

Proof. Theorem 11.21 implies that V contains two linearly independent vectors 
u and v for which 


(u,u) =a #0, (v,v) =b 40, (u,v) =0 


290 Advanced Linear Algebra 


Given any c € F, we want to find a and 8 for which 
c = (au + Bu, au + bv) = aa? + b8? 
or 
aa’? =c- bf? 


If A = {aa? | a € F}, then |A| = (q + 1)/2, since there are (q — 1)/2 nonzero 
squares a, along with a = 0. If B = {c — b8? | B € F}, then for the same 
reasons |B| = (q+ 1)/2. It follows that AM B cannot be the empty set and so 
there exist a and 3 for which aa” = c — b3?.O 


Now we can proceed with the business at hand. 


Theorem 11.29 Let V be an orthogonal geometry over a finite field F and 
assume that V is not symplectic if char(F’) = 2. If char(F’) + 2, then let d be a 
fixed nonsquare in F. For any nonzero a € F, write 


where rk(X;,(a)) = k. 

1) If char(F’) = 2, then there is an ordered basis B for which Mg = X;,(1). 

2) If char(F’) 4 2, then there is an ordered basis B for which Mg equals 
X;,(1) or X;,(d). 

Proof. We can dispose of the case char( F) = 2 quite easily: Referring to (11.2), 

since every element of F has a square root, we may take r; = (/ai ae 


If char( F) 4 2, then Theorem 11.21 implies that there is an ordered orthogonal 
basis 


B= (uy,..., Uk, 21,---,2m) 


for which (ui, ui) = a; 4 0 and (zi, zi) = 0. Hence, Mg has the diagonal form 


Metric Vector Spaces: The Theory of Bilinear Forms 291 


Now consider the nonsingular orthogonal geometry V; = span(wy, u2). 
According to Theorem 11.28, the form is universal when restricted to Vj. 
Hence, there exists a vı € V; for which (v1, v1} = 1. 


Now, vı = ru; + sus for r,s E€ F not both 0, and we may swap u; and uz if 
necessary to ensure that r 4 0. Hence, 


By = (v1, Ua, ... Uk, 21, +++ 5 Zm) 


is an ordered basis for V for which the matrix Mg, is diagonal and has a 1 in the 
upper left entry. We can repeat the process with the subspace V> = span(vy, v3). 
Continuing in this way, we can find an ordered basis 


C= (Dig U2- Uks Aig -+3 2m) 


for which Me = X;,(a) for some nonzero a € F. Now, if a is a square in F, 
then we can replace vy by (1/,/a)v;, to get a basis D for which Mp = X;(1). If 
a is not a square in F, then a = r°d for some r € F and so replacing vp by 
(1/r)vz gives a basis D for which Mp = X;,(d).0 


Theorem 11.30 Let S,, be the set of all n x n symmetric matrices over a finite 
field F. If char(F) = 2, we restrict S,, to the set of all symmetric matrices with 
at least one nonzero entry on the main diagonal. 

1) Ifchar(F) = 2, then any matrix in S,, is congruent to a unique matrix of the 
form Xj,(1) and the matrices {Xj),(1)|k=0,...,n} form a set of 
canonical forms for Sn under congruence. Also, the rank is a complete 
invariant. 

2) If char(F’) 4 2, let d be a fixed nonsquare in F. Then any matrix S, is 
congruent to a unique matrix of the form X;(1) or X;,(d). The set 
{X;(1), X; (d) | k =0,...,n} is a set of canonical forms for congruence 
on Sn. (Thus, there are exactly two congruence classes for each rank k.)O 


The Orthogonal Group 


Having “settled” the classification question for orthogonal geometries over 
certain types of fields, let us turn to a discussion of the structure-preserving 
maps, that is, the isometries. 


Rotations and Reflections 


We begin by examining the matrix of an orthogonal transformation. If 5 is an 
ordered basis for V, then for any x,y € V, 


(x,y) = [2]pMalyle 
and so if r € L(V), then 
(rx, Ty) = [rz]; Ms[ry]s = [z]s(l7]sMslr]s)ly]s 


Hence, 7 is an orthogonal transformation if and only if 


292 Advanced Linear Algebra 


[r]sMe[r]s = Mg 
Taking determinants gives 
det(Mg) = det([r]g)*det(Mz) 
Therefore, if V is nonsingular, then 


det([7]z) = +1 


Since the determinant is an invariant under similarity, we have the following 
theorem. 


Theorem 11.31 Let r be an orthogonal transformation on a nonsingular 
orthogonal geometry V. 
1) det([r]g) is the same for all ordered bases B for V and 


det([r]s) = +1 


This determinant is called the determinant of 7 and denoted by det(r). 

2) If det(r) = 1, then T is called a rotation and if det(r) = —1, then T is 
called a reflection. 

3) The set O*(V) of rotations is a subgroup of the orthogonal group O(V) 
and the determinant map det: O(V) — {—1,1} is an epimorphism with 
kernel O*(V). Hence, if char(F) £ 2, then OT(V) is a normal subgroup 
of O(V ) of index 2.0 


Symmetries 


Recall again that for a real inner product space, a reflection H, is defined as an 
operator for which 


H,u = —u, H,w = w for all w € (u)t+ 
and that 
2 
(vu) 


(u, u) 


H, £ = £ — 


In particular, if char(F) #2 and u €V is nonisotropic, then span(u) is 
nonsingular and so 


V = span(u) © span(u)* 


Then the reflection H, is well-defined and, in the context of general orthogonal 
geometries, is called the symmetry determined by u and we will denote it by 
Gu. We can also write o,, = —u © 1, that is, 


Ou(a+y) = —a+y 


for all x € span(w) and y € span(u)*. 


Metric Vector Spaces: The Theory of Bilinear Forms 293 


For real inner product spaces, Theorem 10.16 says that if ||v|| = ||w|| A 0, then 
Hf, is the unique reflection sending v to w, that is, H,-~(v) = w. In the 


present context, we must be careful, since symmetries are defined for 
nonisotropic vectors only. Here is what we can say. 


Theorem 11.32 Let V be a nonsingular orthogonal geometry over a field F, 
with char(F’) # 2. Ifu,v € V are nonisotropic vectors with the same (nonzero) 
“length,” that is, if 


(usu) = (v,v) #0 
then there exists a symmetry o for which 
ou=v or ou=—vU 


Proof. Since u and v are nonisotropic, one of u — v or u +v must also be 
nonisotropic, for otherwise, since u — v and u + v are orthogonal, their sum 2u 
would also be isotropic. If u + v is nonisotropic, then 


Outy(u +v) = —(u + v) 


and 
Cuplu — v) =u- v 
and so 0,,4,u = —v. On the other hand, if u — v is nonisotropic, then 
Culu — v) = —(u—v) 
and 


Oulu +v) =utv 


and so o,,_,u = v.O 


Recall that an operator on a real inner product space is unitary if and only if it is 
a product of reflections. Here is the generalization to nonsingular orthogonal 
geometries. 


Theorem 11.33 Let V be a nonsingular orthogonal geometry over a field F 
with char(F) #2. A linear transformation +t on V is an orthogonal 
transformation if and only if T is the product of symmetries on V. 

Proof. The proof is by induction on d = dim(V). If d = 1, then V = span(v) 
where (v, v) 4 0. Let rv = av for a € F. Since T is unitary 


a’ lv, v) = (av, av) = (rv, Tv) = (w, v) 


and so a= +1. If a = 1, then 7 is the identity, which is equal to 02. On the 
other hand, if a = —1 then T = o,. In either case, 7 is a product of symmetries. 


294 Advanced Linear Algebra 


Assume now that the theorem is true for dimensions less than d and let 
dim(V) = d. Let v € V be nonisotropic. Since (rv, Tv) = (v, v) #0, Theorem 
11.32 implies the existence of a symmetry ø on V for which 


o(Tv) = ev 


where e = £1. Thus, ør = +: on span(v). Since Theorem 11.9 implies that 
span(v)* is or-invariant, we may apply the induction hypothesis to ør on 
span(v)~ to get 

OF |span(v)+ = Ow, Ow, = P 


where w; € span(v)+ and each cwu, is a symmetry on span(v)+. But each ow, can 
be extended to a symmetry on V by setting o,,,0 = v. Assume that p is the 
extension of p to V, where p = ų on span(v). Hence, or = p on span(v)+ and 
OT = ep on span(v). 


If e = 1, then ot =p on V and so tT = øp, which completes the proof. If 
e = —1, then or = cp on span(v)* since g, is the identity on span(v)+ and 
oT = op on span(v). Hence, or = op on V and so T = oo,p on V.O 


The Witt Theorems for Orthogonal Geometries 


We are now ready to consider the Witt theorems for orthogonal geometries. 


Theorem 11.34 (Witt's cancellation theorem) Let V and W be isometric 
nonsingular orthogonal geometries over a field F with char(F’) 4 2. Suppose 
that 


V=S@S+t and W=TOTt 
Then 
S xT > St x TH 


Proof. First, we prove that it is sufficient to consider the case V = W. Suppose 
that the result holds when V = W and that u: V — W is an isometry. Then 


u(S) ews) = uS oS = uV =W=TOT 


Furthermore, yS ~ S ~ T. We can therefore apply the theorem to W to get 
S* ~ u(S~) ~ TH 
as desired. To prove the theorem when V = W, assume that 


V=S0St=ToT? 


where S and T are nonsingular and S = T. Let 7: S — T be an isometry. We 
proceed by induction on dim(S). 


Metric Vector Spaces: The Theory of Bilinear Forms 295 


Suppose first that dim(S) = 1 and that S = span(s). Since 
(Ts, Ts} = (s,s) #0 


Theorem 11.32 implies that there is a symmetry o for which os = ers where 
€ = +1. Hence, o is an isometry of V for which T = oS and Theorem 11.9 
implies that T+ = o( S+). Thus, a|s. is the desired isometry. 


Now suppose the theorem is true for dim(S') < k and let dim(S) = k. Let 
T: S — T be an isometry. Since S is nonsingular, we can choose a nonisotropic 
vector s € S and write S = span(s) © U, where U is nonsingular. It follows 
that 


V = § © S+ =span(s) OU © S+ 
and 
V =TOT* = 7(span(s)) © TU © T+ 
Now we may apply the one-dimensional case to deduce that 
U © S+ x rU OTH 
Ifo: U © S+ — TU ©T* is an isometry, then 
oU ©o(S-) = o(U © SŁ) = TU © TH 


But oU ~ TU and since dim(oU) = dim(U) < k, the induction hypothesis 
implies that S+ ~ o(S+) ~ T+.0 


As we have seen, Witt's extension theorem is a corollary of Witt's cancellation 
theorem. 


Theorem 11.35 (Witt's extension theorem) Let V and V' be isometric 
nonsingular orthogonal geometries over a field F, with char(F) 4 2. Suppose 
that U is a subspace of V and 


7:U =U CV! 
is an isometry. Then T can be extended to an isometry from V to V'.O 
Maximal Hyperbolic Subspaces of an Orthogonal Geometry 
We have seen that any orthogonal geometry V can be written in the form 

V =U Orad(V) 


where U is nonsingular. Nonsingular spaces are better behaved than singular 
ones, but they can still possess isotropic vectors. 


296 Advanced Linear Algebra 


We can improve upon the preceding decomposition by noticing that if u € U is 
isotropic, then Theorem 11.10 implies that span(u) can be “captured” in a 
hyperbolic plane H = span(u, a). Then we can write 


V = H © HVY Orad(V) 


where H+” is the orthogonal complement of H in U and has “one fewer” 
isotropic vector. In order to generalize this process, we first discuss maximal 
totally degenerate subspaces. 


Maximal Totally Degenerate Subspaces 


Let V be a nonsingular orthogonal geometry over a field F, with char( F) # 2. 
Suppose that U and U’ are maximal totally degenerate subspaces of V. We 
claim that dim(U ) = dim(U’). For if dim(U) < dim(U’), then there is a vector 
space isomorphism 7: U — TU C U', which is also an isometry, since U and 
U' are totally degenerate. Thus, Witt's extension theorem implies the existence 
of an isometry 7:V — V that extends 7. In particular, T~'(U’) is a totally 
degenerate space that contains U and so 7 '(U’) =U, which shows that 
dim(U) = dim(U’). 


Theorem 11.36 Let V be a nonsingular orthogonal geometry over a field F, 

with char(F) # 2. 

1) All maximal totally degenerate subspaces of V have the same dimension, 
which is called the Witt index of V and is denoted by w(V). 

2) Any totally degenerate subspace of V of dimension w(V ) is maximal.O 


Maximal Hyperbolic Subspaces 


We can prove by a similar argument that all maximal hyperbolic subspaces of V 
have the same dimension. Let 


Ho, = H ©- © Hk 
and 
Kom = Kı O O Kn 


be maximal hyperbolic subspaces of V and suppose that H; = span(u;,v;) and 
K; = span(zx;, yi). We may assume that dim(H) < dim(X). 


The linear map 7: H — K defined by 


TU, = Ti, TVi = Yi 


is clearly an isometry from H to TH. Thus, Witt's extension theorem implies the 
existence of an isometry T:V — V that extends r. In particular, T~!(K) is a 
hyperbolic space that contains H and so 7 1(K) = H. It follows that dim(X) 
= dim(H). 


Metric Vector Spaces: The Theory of Bilinear Forms 297 


It is not hard to see that the maximum dimension h(V) of a hyperbolic subspace 
of V is 2w(V), where w(V) is the Witt index of V. First, the nonsingular 
extension of a maximal totally degenerate subspace U,, of V is a hyperbolic 
space of dimension 2w(V) and so h(V) > 2w(V). On the other hand, there is a 
totally degenerate subspace Ux contained in any hyperbolic space Hə and so 
k<w(V), that is, dim(H2,) <2w(V). Hence h(V) <2w(V) and so 
h(V) = 2w(V). 


Theorem 11.37 Let V be a nonsingular orthogonal geometry over a field F, 
with char(F) 4 2. 

I) All maximal hyperbolic subspaces of V have dimension 2w(V ). 

2) Any hyperbolic subspace of dimension 2w(V ) must be maximal. 

3) The Witt index of a hyperbolic space Ho, is k.O 


The Anisotropic Decomposition of an Orthogonal Geometry 
If H is a maximal hyperbolic subspace of V, then 
V=Hor 


Since H is maximal, H+ is anisotropic, for if u € H+ were isotropic, then the 
nonsingular extension of H © span(u) would be a hyperbolic space strictly 
larger than H. 


Thus, we arrive at the following decomposition theorem for orthogonal 
geometries. 


Theorem 11.38 (The anisotropic decomposition of an orthogonal geometry) 
Let V =U ©rad(V) be an orthogonal geometry over F, with char(F’) + 2. Let 
H be a maximal hyperbolic subspace of U, where H = {0} if U has no 
isotropic vectors. Then 


V=SOH Orad(V) 


where S is anisotropic, H is hyperbolic of dimension 2w(V) and rad(V) is 
totally degenerate.O 


Exercises 


1. Let U,W be subspaces of a metric vector space V. Show that 
a) UCWawicu 
b UCU 
c) U+ = Uttt 
2. Let U,W be subspaces of a metric vector space V. Show that 
a) (U+W)-=UtnWw+t 
b) (UNW)t+ =Ut+Wwt 
3. Prove that the following are equivalent: 
a) V is nonsingular 
b) (u, x) = (v, x) for all x € V implies u = v 


298 Advanced Linear Algebra 


4. Show that a metric vector space V is nonsingular if and only if the matrix 
Mg of the form is nonsingular, for every ordered basis B. 

5. Let V be a finite-dimensional vector space with a bilinear form (,). We do 
not assume that the form is symmetric or alternate. Show that the following 
are equivalent: 

a) {ve V | (v,w) =0forallweV}=0 

b) {vEV | (w,v) =0 forallweV}=0 

Hint: Consider the singularity of the matrix of the form. 
6. Finda diagonal matrix congruent to 


I “2+ <3 
2 0 1 

3 1 -1 
7. Prove that the matrices 


noli Slob t 


are congruent over the base field F = Q of rational numbers. Find an 
invertible matrix P such that P!P = M. 

8. Let V be an orthogonal geometry over a field F with char(F) 4 2. We 
wish to construct an orthogonal basis O = (w1,..., Un) for V, starting with 
any generating set G = (v1, ..., Un). Justify the following steps, essentially 
due to Lagrange. We may assume that V is not totally degenerate. 

a) If (v; vi) 4 0 for some i, then let u; = v;. Otherwise, there are indices 
i Æ j for which (v;i, vj) Æ 0. Let wy = vi + vj. 

b) Assume we have found an ordered set of vectors Op = (u1,..., Ux) 
that form an orthogonal basis for a subspace V; of V and that none of 
the u;'s are isotropic. Then V = V; © V;,-. 

c) For each v; € G, let 


(vi, Uj) 


Wi = Ui — 
' f j=l (uj, Uj) 


Uj 


Then the vectors w; span V+. If V} is totally degenerate, take any 
basis for V,+ and append it to O;,. Otherwise, repeat step a) on V,+ to 
get another vector ug+ı and let Oz41 = (u1, .-.., Uk+1). Eventually, we 
arrive at an orthogonal basis O, for V. 

9. Prove that orthogonal hyperbolic planes may be characterized as two- 
dimensional nonsingular orthogonal geometries that have exactly two one- 
dimensional totally isotropic (equivalently: totally degenerate) subspaces. 

10. Prove that a two-dimensional nonsingular orthogonal geometry is a 
hyperbolic plane if and only if its discriminant is F?( — 1). 

11. Does Minkowski space contain any isotropic vectors? If so, find them. 

12. Is Minkowski space isometric to Euclidean space R4? 


13. 


14. 


15. 


16. 


17. 


18. 
19. 


20. 


21. 
22. 
23. 


24. 


29% 


Metric Vector Spaces: The Theory of Bilinear Forms 299 


If (,) is a symmetric bilinear form on V and char(F) 4 2, show that 
Q(x) = (x, x) /2 is a quadratic form. 

Let V be a vector space over a field F, with ordered basis B = (v1,..., Un). 
Let p(z£1, ... , &n) be a homogeneous polynomial of degree d over F, that is, 
a polynomial each of whose terms has degree d. The d-form defined by p 
is the function from V to F defined as follows. If v = Xajvi, then 


p(v) = plar, cee Qn) 


(We use the same notation for the form and the polynomial.) Prove that 2- 
forms are the same as quadratic forms. 

Show that 7 is an isometry on V if and only if Q(7v) = Q(v) where Q is 
the quadratic form associated with the bilinear form on V. (Assume that 
char(F) Æ 2.) 

Show that a quadratic form Q on V satisfies the parallelogram law: 


Q(z +y) + Qa — y) = 2[Q(z) + Q(y)] 


Show that if V is a nonsingular orthogonal geometry over a field F, with 
char(F) Æ 2, then any totally isotropic subspace of V is also a totally 
degenerate space. 

Is it true that V = rad(V) © rad(V)+? 

Let V be a nonsingular symplectic geometry and let 7,,,, be a symplectic 
transvection. Prove that 

a) Ty,aTv,b = Tv,a+b 

b) For any symplectic transformation o, 


—1 
OTua = Tova 


c) Forb € F*, 


Tbv,a = Tu ab? 


d) For a fixed v #0, the map a> Typa is an isomorphism from the 
additive group of F onto the group {Tv,a | a E€ F} C Sp(V). 

Prove that if x is any nonsquare in a finite field F}, then all nonsquares 

have the form r?x, for some r€ F. Hence, the product of any two 

nonsquares in F} is a square. 

Formulate Sylvester's law of inertia in terms of quadratic forms on V. 

Show that a two-dimensional space is a hyperbolic plane if and only if it is 

nonsingular and contains an isotropic vector. Assume that char(F’) 4 2. 

Prove directly that a hyperbolic plane in an orthogonal geometry cannot 

have an orthogonal basis when char(F’) = 2. 

a) Let U be a subspace of V. Show that the inner product 
(c +U,y+U) = (x,y) on the quotient space V/U is well-defined if 
and only if U C rad(V). 

b) IfU Crad(V), when is V/U nonsingular? 

Let V = N © S, where N isa totally degenerate space. 


300 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


31: 


Advanced Linear Algebra 


a) Prove that N = rad(V) if and only if S is nonsingular. 

b) If.S is nonsingular, prove that S ~ V /rad(V). 

Let dim(V) =dim(W). Prove that V/rad(V) ~ W/rad(W) implies 
Vaw. 

Let V = S © T. Prove that 

a) rad(V) = rad(S) © rad(T) 

b) V/rad(V) © S/rad(S) © T/rad(T) 

c) dim(rad(V)) = dim(rad($)) + dim(rad(T)) 

d) V is nonsingular if and only if S and T are both nonsingular. 

Let V be a nonsingular metric vector space. Because the Riesz 
representation theorem is valid in V, we can define the adjoint 7* of a linear 
map 7 € L(V) exactly as in the case of real inner product spaces. Prove 
that + is an isometry if and only if it is bijective and unitary (that is, 
TT =); 

If char(F’) # 2, prove that r € L(V, W) is an isometry if and only if it is 
bijective and (rv, Tv) = (v, v} forall v € V. 

Let B = {v1,...,Un} be a basis for V. Prove that r€ L(V,W) is an 
isometry if and only if it is bijective and (Tu;, Tuj} = (vi, v;) for all i, j. 

Let 7 be a linear operator on a metric vector space V. Let B = (v1, ..., Un) 
be an ordered basis for V and let Mg be the matrix of the form relative to 
B. Prove that 7 is an isometry if and only if 


[ls Me[r]s = Mg 


Let V be a nonsingular orthogonal geometry and let r € L(V) be an 

isometry. 

a) Show that dim(ker(c — r)) = dim(im(z — 7)+). 

b) Show that ker(u—7)=im(s—7)+. How would you describe 
ker(z — T) in words? 

c) If7isasymmetry, what is dim(ker(c — 7))? 

d) Can you characterize symmetries by means of dim(ker(u — 7))? 

A linear transformation T € L(V) is called unipotent if 7 — ų is nilpotent. 

Suppose that V is a nonisotropic metric vector space and that 7 is unipotent 

and isometric. Show that 7 = v. 

Let V be a hyperbolic space of dimension 2m and let U be a hyperbolic 

subspace of V of dimension 2k. Show that for each k < j < m, there is a 

hyperbolic subspace Hə; of V for which U C Ha; CV. 

Let char(F’) Æ 2. Prove that if X is a totally degenerate subspace of an 

orthogonal geometry V, then dim(X) < dim(V)/2. 

Prove that an orthogonal geometry V of dimension n is a hyperbolic space 

if and only if V is nonsingular, n is even and V contains a totally 

degenerate subspace of dimension n/2. 

Prove that a symplectic transformation has determinant equal to 1. 


Chapter 12 
Metric Spaces 


The Definition 


In Chapter 9, we studied the basic properties of real and complex inner product 
spaces. Much of what we did does not depend on whether the space in question 
is finite-dimensional or infinite-dimensional. However, as we discussed in 
Chapter 9, the presence of an inner product and hence a metric, on a vector 
space, raises a host of new issues related to convergence. In this chapter, we 
discuss briefly the concept of a metric space. This will enable us to study the 
convergence properties of real and complex inner product spaces. 


A metric space is not an algebraic structure. Rather it is designed to model the 
abstract properties of distance. 


Definition 4 metric space is a pair (M,d), where M is a nonempty set and 
d: Mx M —R is a real-valued function, called a metric on M, with the 
following properties. The expression d(x,y) is read “the distance from x to y.” 
1) (Positive definiteness) For all x,y € M, 


d(x,y) >0 


and d(x,y) = 0 ifand only if x = y. 
2) (Symmetry) For all x,y € M, 


d(x,y) = d(y,x) 
3) (Triangle inequality) For all x,y,z € M, 
d(x,y) < d(x,z) + d(z,y) O 


As is customary, when there is no cause for confusion, we simply say “let M be 
a metric space.” 


302 Advanced Linear Algebra 


Example 12.1 Any nonempty set M is a metric space under the discrete 
metric, defined by 


_f0 ife=y 
dept! aa) = 


Example 12.2 
1) The set R” is a metric space, under the metric defined for £ = (£1, ..., En) 


and y = (y1, --- , Yn) by 
d(x,y) = y (z1 — y1)? + + (£n — Yn)? 


This is called the Euclidean metric on R”. We note that R” is also a metric 
space under the metric 


d, (x, y) = |x — yıl pares |En — Yn| 


Of course, (R”, d) and (R”, dı) are different metric spaces. 
2) The set C” is a metric space under the unitary metric 


d(z,y) = ylei- nl? + + len — Yok 


where x = (z1, ... , £n) and y = (y1,.--,; Yn) are in C”. O 


Example 12.3 
1) The set C [a,b] of all real-valued (or complex-valued) continuous functions 
on [a,b] is a metric space, under the metric 


d(f,g) = sup |f (£) — g(z)| 


x€[a,b] 


We refer to this metric as the sup metric. 
2) The set C[a, b] of all real-valued (or complex-valued) continuous functions 
on [a,b] is a metric space, under the metric 


b 
dy( f(a), 9(2)) = f Ife) - g(a)| de o 


Example 12.4 Many important sequence spaces are metric spaces. We will 

often use boldface italic letters to denote sequences, as in x = (£n) and 

Y = (Yn). 

1) The set 4% of all bounded sequences of real numbers is a metric space 
under the metric defined by 


d(x,y) = sup|£n — Yn| 


The set ¿© of all bounded complex sequences, with the same metric, is also 


a metric space. As is customary, we will usually denote both of these spaces 
by 4%. 


2) 


Metric Spaces 303 


For p > 1, let Æ be the set of all sequences x = (£n) of real (or complex) 
numbers for which 


We define the p-norm of x by 
a 1/p 
læll, = (>: an) 
Then Æ is a metric space, under the metric 
es 1/p 
d(x,y) = |x — yll, = (>: |En — wt) 


The fact that 4 is a metric follows from some rather famous results about 
sequences of real or complex numbers, whose proofs we leave as (well- 
hinted) exercises. 


Holder's inequality Let p,q > 1 and p +q = pq. If x € P and ye £4, 
then the product sequence xy = (£nYn) is in 4 and 


læylı < llellpilyll, 
that is, 
co 0° 1/p 00 1/q 
5 ERTA < (>: enr’) (>: mt 
n=1 n=1 n=1 


A special case of this (with p = q = 2) is the Cauchy—Schwarz inequality 


o0 ee) A ee) 7 
X [enun] < JŠ ala S Iynl” 
n=1 n=1 n=1 


Minkowski's inequality For p> 1, if x,y € Æ then the sum x+y 
= (£n + Yn) is in Æ and 


læ + yll, < llell, + Iyl, 


that is, 


1/p 


A 1/p a 1/p m 
> |En tat) < (Eir) + (>: on) O 
n=1 n=1 n=1 


304 Advanced Linear Algebra 


If M is a metric space under a metric d, then any nonempty subset S of M is 
also a metric under the restriction of d to S x S. The metric space S thus 
obtained is called a subspace of M. 


Open and Closed Sets 


Definition Let M be a metric space. Let x) E€ M and let r be a positive real 
number. 
1) The open ball centered at xo, with radius r, is 


B(ao,r) = {x E€ M | d(a, 29) < r} 
2) The closed ball centered at xo, with radius r, is 

B(azo,r) = {x € M | d(x, £o) < r} 
3) The sphere centered at xo, with radius r, is 


S(xo,r) = {a € M | d(x, z0) =r} o 


Definition A subset S of a metric space M is said to be open if each point of S 
is the center of an open ball that is contained completely in S. More 
specifically, S is open if for all x € S, there exists an r >Q such that 
B(x,r) C S. Note that the empty set is open. A set T C M is closed if its 
complement T° in M is open. O 


It is easy to show that an open ball is an open set and a closed ball is a closed 
set. If xe M, we refer to any open set S containing x as an open 
neighborhood of «. It is also easy to see that a set is open if and only if it 
contains an open neighborhood of each of its points. 


The next example shows that it is possible for a set to be both open and closed, 
or neither open nor closed. 


Example 12.5 In the metric space R with the usual Euclidean metric, the open 
balls are just the open intervals 


B(xo,7) = (£o — r, £o + r) 


and the closed balls are the closed intervals 


B(xo,7) = [£o — r, zo + r] 


Consider the half-open interval S = (a, b], for a < b. This set is not open, since 
it contains no open ball centered at b € S and it is not closed, since its 
complement S° = (—oco, a] U (b, 00) is not open, since it contains no open ball 
about a. 


Metric Spaces 305 


Observe also that the empty set is both open and closed, as is the entire space R. 
(Although we will not do so, it is possible to show that these are the only two 
sets that are both open and closed in R.)O 


It is not our intention to enter into a detailed discussion of open and closed sets, 
the subject of which belongs to the branch of mathematics known as topology. 
In order to put these concepts in perspective, however, we have the following 
result, whose proof is left to the reader. 


Theorem 12.1 The collection O of all open subsets of a metric space M has the 
following properties: 

D 0€0O,MeEO 

2) IfS,TEOthnSATEO 

3) If{S; |i € K} is any collection of open sets, then jen Si € O.O 

These three properties form the basis for an axiom system that is designed to 
generalize notions such as convergence and continuity and leads to the 
following definition. 


Definition Let X be a nonempty set. A collection O of subsets of X is called a 
topology for X if it has the following properties: 
D 0€0,XEO 
2) Ff S,TE€OthknSAT EO 
3) If {S; | i € K} is any collection of sets in O, then |] S; € O. 
ick 
We refer to subsets in O as open sets and the pair (X,O) as a topological 
space. O 


According to Theorem 12.1, the open sets (as we defined them earlier) in a 
metric space M form a topology for M, called the topology induced by the 
metric. 


Topological spaces are the most general setting in which we can define concepts 
such as convergence and continuity, which is why these concepts are called 
topological concepts. However, since the topologies with which we will be 
dealing are induced by a metric, we will generally phrase the definitions of the 
topological properties that we will need directly in terms of the metric. 


Convergence in a Metric Space 


Convergence of sequences in a metric space is defined as follows. 


Definition A sequence (£n) in a metric space M converges to x € M, written 
(n) > z, if 


lim d(zn, £) = 0 


nC 


Equivalently, (zn) — x if for any e€ > 0, there exists an N > 0 such that 


306 Advanced Linear Algebra 


n > N => d(an,x) <e€ 
or equivalently, 
n > N => a, E€ B(a,€) 


In this case, x is called the limit of the sequence (zn). O 


If M is a metric space and S is a subset of M, by a sequence in S, we mean a 
sequence whose terms all lie in S. We next characterize closed sets and 
therefore also open sets, using convergence. 


Theorem 12.2 Let M be a metric space. A subset S C M is closed if and only if 
whenever (a,,) is a sequence in S and (£) > x, then x € S. In loose terms, a 
subset S is closed if it is closed under the taking of sequential limits. 

Proof. Suppose that S is closed and let (zn) — x, where x, E€ S for all n. 
Suppose that x ¢ S. Then since x € S and S° is open, there exists an € > 0 for 
which z € B(x,€) C S°. But this implies that 


B(x,€)N{a,}=0 


which contradicts the fact that (x„) — x. Hence, x € S. 


Conversely, suppose that S is closed under the taking of limits. We show that 
S° is open. Let x € S° and suppose to the contrary that no open ball about x is 
contained in S°. Consider the open balls B(x, 1/n), for all n > 1. Since none of 
these balls is contained in S°, for each n, there is an x, € SM B(x, 1/n). It is 
clear that (£n) —> x and so x € S. But x cannot be in both S and S°. This 
contradiction implies that S° is open. Thus, S is closed. O 


The Closure of a Set 


Definition Let S be any subset of a metric space M. The closure of S, denoted 
by cl(S), is the smallest closed set containing S.O 


We should hasten to add that, since the entire space M is closed and since the 
intersection of any collection of closed sets is closed (exercise), the closure of 
any set S does exist and is the intersection of all closed sets containing S. The 
following definition will allow us to characterize the closure in another way. 


Definition Let S be a nonempty subset of a metric space M. An element x € M 
is said to be a limit point, or accumulation point, of S if every open ball 
centered at x meets S at a point other than x itself. Let us denote the set of all 
limit points of S by ¢('}).0 


Here are some key facts concerning limit points and closures. 


Metric Spaces 307 


Theorem 12.3 Let S be a nonempty subset of a metric space M. 

I) x € L(S) ifand only if there is a sequence (xn) in S for which £n # x for 
all n and (£) > 7. 

2) S is closed if and only if €(S') C S. In words, S is closed if and only if it 
contains all of its limit points. 

3) cl(S)=SUL(S). 

4) x €cl(S) ifand only if there is a sequence (x,,) in S for which (a) > z. 

Proof. For part 1), assume first that x € (S). For each n, there exists a point 

Tn Æ x such that x, € B(x, 1/n) N S. Thus, we have 


d(£n, £) < 1/n 


and so (£n) — «. For the converse, suppose that (£n) > x, where x Ax, € S. 
If B(x,r) is any ball centered at x, then there is some N such that n > N 
implies x, € B(x,r). Hence, for any ball B(x,r) centered at x, there is a point 
Ln Æ x such that x, € S N B(x,r). Thus, x is a limit point of S. 


As for part 2), if S is closed, then by part 1), any x € (S) is the limit of a 
sequence (£n) in S and so must be in S. Hence, 0(.S) C S. Conversely, if 
L(S) C S, then S is closed. For if (a,,) is any sequence in S and (2) — x, then 
there are two possibilities. First, we might have x, = x for some n, in which 
case x = £n € S. Second, we might have x, # x for all n, in which case 
(£n) — x implies that x € 0(S') C S. In either case, x € S and so S is closed 
under the taking of limits, which implies that S is closed. 


For part 3), let T = S U (S). Clearly, S C T. To show that T is closed, we 
show that it contains all of its limit points. So let x € (T). Hence, there is a 
sequence (£n) E€ T for which x, # x and (x,) — x. Of course, each £p is 
either in S, or is a limit point of S. We must show that x € T, that is, that a is 
either in S or is a limit point of S. 


Suppose for the purposes of contradiction that x ¢ S and a ¢ ¢(.S). Then there 
is a ball B(x,r) for which B(x,r) NS # Ø. However, since (x,) — x, there 
must be an £n E B(x,r). Since x, cannot be in S, it must be a limit point of S. 
Referring to Figure 12.1, if d(a,,2)=d< r, then consider the ball 
B(xp, (r—d)/2). This ball is completely contained in B(x,r) and must contain 
an element y of S, since its center £n is a limit point of S. But then 
y E SN B(x,r), a contradiction. Hence, x E€ S or x € (S). In either case, 
x € T = S U (S) and so T is closed. 


Thus, T is closed and contains S and so cl(S) CT. On the other hand, 
T = SU L(S) C cl(S) and so cl(S) = T. 


308 Advanced Linear Algebra 


pe) i 


Figure 12.1 


For part 4), if x € cl(S), then there are two possibilities. If x € S, then the 
constant sequence (£n), with £n = x for all x, is a sequence in S that converges 
to x. If  € S, then x € (S) and so there is a sequence (zn) in S for which 
Zn # x and (£n) — x. In either case, there is a sequence in S converging to x. 
Conversely, if there is a sequence (x„) in S for which (x,) —> x, then either 
XL, = x for some n, in which case x € S C cl(S), or else x, Æ <x for all n, in 
which case x € (S) C cl(S).0 


Dense Subsets 


The following concept is meant to convey the idea of a subset S C M being 
“arbitrarily close” to every point in M. 


Definition A subset S of a metric space M is dense in M if cl(S) = M. A 
metric space is said to be separable if it contains a countable dense subset. O 


Thus, a subset S of M is dense if every open ball about any point x € M 
contains at least one point of S. 


Certainly, any metric space contains a dense subset, namely, the space itself. 
However, as the next examples show, not every metric space contains a 
countable dense subset. 


Example 12.6 

1) The real line R is separable, since the rational numbers Q form a countable 
dense subset. Similarly, R” is separable, since the set Q” is countable and 
dense. 

2) The complex plane C is separable, as is C” for all n. 

3) A discrete metric space is separable if and only if it is countable. We leave 
proof of this as an exercise. 


Metric Spaces 309 


Example 12.7 The space £% is not separable. Recall that 2° is the set of all 
bounded sequences of real numbers (or complex numbers) with metric 


d(x,y) = sup|£n — Yn| 


To see that this space is not separable, consider the set S of all binary sequences 


S = {(a,) | zi = 0 or 1 for all i} 


This set is in one-to-one correspondence with the set of all subsets of N and so 
is uncountable. (It has cardinality 2% > No.) Now, each sequence in S is 
certainly bounded and so lies in 4%. Moreover, if x Æ y € 4%, then the two 
sequences must differ in at least one position and so d(x, y) = 1. 


In other words, we have a subset S of £% that is uncountable and for which the 
distance between any two distinct elements is 1. This implies that the balls in the 
uncountable collection {B(s,1/3) | s € S} are mutually disjoint. Hence, no 
countable set can meet every ball, which implies that no countable set can be 
dense in €°.0 


Example 12.8 The metric spaces Æ are separable, for p > 1. The set S of all 
sequences of the form 
S = (qi,---;Qn,9,---) 


for all n > 0, where the q;'s are rational, is a countable set. Let us show that it is 
dense in Æ. Any x € & satisfies 


CO 
S. len)? < 09 
n=1 


Hence, for any € > 0, there exists an N such that 


(oe) 


€ 
de lal<a 


n=N+1 


Since the rational numbers are dense in R, we can find rational numbers q; for 
which 


|z; — q|” < oN 


for alli = 1,..., N. Hence, if s = (q,...,qy,0,...), then 
A x eu 
d(x, s)? = 2 |En = Gal? ale > lnl < 2 ir 2 =E 
n=l n=N+1 


which shows that there is an element of S arbitrarily close to any element of &. 
Thus, S is dense in Æ and so Æ is separable. 


310 Advanced Linear Algebra 


Continuity 


Continuity plays a central role in the study of linear operators on infinite- 
dimensional inner product spaces. 


Definition Let f: M — M' be a function from the metric space (M,d) to the 
metric space (M"',d'). We say that f is continuous at xo E€ M if for any e > 0, 
there exists a 6 > 0 such that 


d(x, zo) <6 > d'(f (x), f(t0)) < € 
or, equivalently, 


f (B(20,8)) © B(F (wo), ¢) 


(See Figure 12.2.) A function is continuous if it is continuous at every 
zo € M.O 


Figure 12.2 


We can use the notion of convergence to characterize continuity for functions 
between metric spaces. 


Theorem 12.4 4 function f: M — M' is continuous if and only if whenever 
(£n) is a sequence in M that converges to xo E€ M, then the sequence (f(x£n)) 
converges to f (xo), in short, 


(En) > zo => (f(£n)) > f (xo) 


Proof. Suppose first that f is continuous at xo and let (xn) —> xo. Then, given 
e > 0, the continuity of f implies the existence of a 6 > 0 such that 


f (B(xo, 6)) C B(f (xo), €) 


Since (2,,) — x, there exists an N > 0 such that zn € B(ao,6) for n > N and 
so 


n >N > f(£n) € B(f(£0),€) 
Thus, f (@n) = f(zo). 


Conversely, suppose that (x,,) —> zo implies (f (£„)) > f (£o). Suppose, for the 
purposes of contradiction, that f is not continuous at zo. Then there exists an 


Metric Spaces 311 


€ > 0 such that for all 6 > 0, 
f (Blo, 8)) Z BUF), 6) 


Thus, for all n > 0, 
1 
f (B (x 3) Z B(f(20),€) 


and so we may construct a sequence (£n) by choosing each term £n with the 
property that 


In E B( to, ~), but f(zn) ¢ B(f (20), €) 


Hence, (£n) —> £o, but f(x,,) does not converge to f(x). This contradiction 
implies that f must be continuous at 29.0 


The next theorem says that the distance function is a continuous function in both 
variables. 


Theorem 12.5 Let (M,d) be a metric space. If (zn) > x and (Yn) —> y, then 


df Tris Yn) = d(x, y). 
Proof. We leave it as an exercise to show that 


|d(2n, Yn) — A(x, y)| < d(En, £) + d(Yn, y) 

But the right side tends to 0 as n — 00 and so d(£n, Yn) > d(x, y).O 
Completeness 
The reader who has studied analysis will recognize the following definitions. 
Definition A sequence (xn) in a metric space M is a Cauchy sequence if for 
any € > 0, there exists an N > 0 for which 

n,m > N = d(&n,Um) < € o 
We leave it to the reader to show that any convergent sequence is a Cauchy 


sequence. When the converse holds, the space is said to be complete. 


Definition Let M be a metric space. 

1) M is said to be complete if every Cauchy sequence in M converges in M. 

2) A subspace S of M is complete if it is complete as a metric space. Thus, S 
is complete if every Cauchy sequence (sn) in S converges to an element in 


S.O 


Before considering examples, we prove a very useful result about completeness 
of subspaces. 


312 Advanced Linear Algebra 


Theorem 12.6 Let M be a metric space. 

1) Any complete subspace of M is closed. 

2) If M is complete, then a subspace S of M is complete if and only if it is 
closed. 

Proof. To prove 1), assume that S is a complete subspace of M. Let (£n) be a 

sequence in S$ for which (x,,) —> x € M. Then (2,,) is a Cauchy sequence in S 

and since S is complete, (x,,) must converge to an element of S. Since limits of 

sequences are unique, we have x € S. Hence, S is closed. 


To prove part 2), first assume that S is complete. Then part 1) shows that S' is 
closed. Conversely, suppose that S is closed and let (x) be a Cauchy sequence 
in S. Since (xn) is also a Cauchy sequence in the complete space M, it must 
converge to some x € M. But since § is closed, we have (xn) —> x € S. Hence, 
S is complete. 


Now let us consider some examples of complete (and incomplete) metric spaces. 


Example 12.9 It is well known that the metric space R is complete. (However, a 
proof of this fact would lead us outside the scope of this book.) Similarly, the 
complex numbers C are complete. O 


Example 12.10 The Euclidean space R” and the unitary space C” are complete. 
Let us prove this for R”. Suppose that (x+) is a Cauchy sequence in R”, where 
Tk = (Tki; ves Tkn) 


Thus, 


n 
d(Ek, Em) = X (ari = Emi) — Oas k, m —> CO 
i=1 


and so, for each coordinate position 2, 
=m <0 2 <0) 
(Tki Bra) = (£k, Em) = 


which shows that the sequence (£p i)k=1,2,... of ith coordinates is a Cauchy 
sequence in R. Since R is complete, we must have 


(£k i) + yi as k — oo 
If y = (Y1, --- , Yn), then 


dlar y)? =X (zki — yi)? > 0 as k > œ 
i=1 


and so (£n) — y € R”. This proves that R” is complete. O 


Metric Spaces 313 


Example 12.11 The metric space (C[a, b],d) of all real-valued (or complex- 
valued) continuous functions on [a, b], with metric 


d(f,g) = sup |f(#) — g(a)| 


x€{a,b] 


is complete. To see this, we first observe that the limit with respect to d is the 
uniform limit on |a, b], that is d( fn, f) — 0 if and only if for any e > 0, there is 
an N > 0 for which 


n>N => |fr(x) — f(x)| < € for all x € [a,b] 


Now let (fn) be a Cauchy sequence in (C|a, b], d). Thus, for any e€ > 0, there is 
an N for which 


m,n > N => |fr(x) — fin(x)| < € for all x € [a,b] (12.1) 


This implies that, for each x € [a, b], the sequence (f,,(a)) is a Cauchy sequence 
of real (or complex) numbers and so it converges. We can therefore define a 
function f on |a, b] by 


f(x) = tim fn() 
Letting m — oo in (12.1), we get 
n > N => |fr(x) — f(x)| < e for all x € [a,b] 


Thus, fn(x) converges to f(x) uniformly. It is well known that the uniform 
limit of continuous functions is continuous and so f(x) € C{a,b]. Thus, 
(fnr(x)) + f(x) € Cla, b] and so (Cla, b], d) is complete. 


Example 12.12 The metric space (C[a, b], dı) of all real-valued (or complex- 
valued) continuous functions on [a, b], with metric 


b 
aeae) = i If(e) — g(2)|dz 


is not complete. For convenience, we take [a,b] = [0, 1] and leave the general 
case for the reader. Consider the sequence of functions f,,(a) whose graphs are 
shown in Figure 12.3. (The definition of f(x) should be clear from the graph.) 


314 Advanced Linear Algebra 


Figure 12.3 


We leave it to the reader to show that the sequence (f,,(a)) is Cauchy, but does 
not converge in (C[0, 1], d1). (The sequence converges to a function that is not 
continuous.) 


Example 12.13 The metric space /% is complete. To see this, suppose that (xn) 
is a Cauchy sequence in °, where 

Ln = (Eni; n2; ) 
Then, for each coordinate position i, we have 


lan — Zm,il < sup |En j — Zm,j| + 0 as n,m — œ (12.2) 
j 


Hence, for each 2, the sequence (£n) of ith coordinates is a Cauchy sequence in 
R (or C). Since R (or C) is complete, we have 


(£ni) > yi as n —> CO 
for each coordinate position i. We want to show that y = (y;) € © and that 
(£n) > y. 
Letting m — oo in (12.2) gives 


sup |En; — yj| — 0 as n — oo (12.3) 
J 


and so, for some n, 

|En j — Yj| < 1 forall j 
and so 

lyj| < 1+ |2n,j| for all j 


But since z,, € l”, it is a bounded sequence and therefore so is (yj). That is, 
y = (yj) E 2%. Since (12.3) implies that (£„)— y, we see that Z% is 
complete. O 


Metric Spaces 315 


Example 12.14 The metric space Æ is complete. To prove this, let (x,) be a 
Cauchy sequence in @?, where 


In = (Ln; Tn 2; ) 
Then, for each coordinate position 7, 
CO 
p X ` Pp 
(Ena = Lm,il < tay = Em, j| =. d(an, Ln)” a 0 
j=l 


which shows that the sequence (z,,;) of ith coordinates is a Cauchy sequence in 
R (or C). Since R (or C) is complete, we have 


(£ni) > Yi as n — CO 


We want to show that y = (y;) € ¢? and that (£n) > y. 


To this end, observe that for any € > 0, there is an N for which 
rT 
nm>N=> X leni — Tmi < € 
i=l 
for all r > 0. Now we let m — oo, to get 
r 
n > N = X leni = yil” < E 
i=l 


for all r > 0. Letting r — oo, we get, for any n > N, 


[0.0] 
XY lani — yi|? < € 
i=1 


which implies that (£x„)— y E€ Æ and so y= y—(an)+(a,) E€% and in 
addition, (£n) > y.O 


As we will see in the next chapter, the property of completeness plays a major 
role in the theory of inner product spaces. Inner product spaces for which the 
induced metric space is complete are called Hilbert spaces. 


Isometries 


A function between two metric spaces that preserves distance is called an 
isometry. Here is the formal definition. 


Definition Let (W, d) and (/’,d’) be metric spaces. A function f: M — M” is 
called an isometry if 


a'(f(x), F(y)) = d(x, y) 


316 Advanced Linear Algebra 


for all x,y € M. If f: M — M' is a bijective isometry from M to M’, we say 
that M and M' are isometric and write M ~ M’.O 


Theorem 12.7 Let f: (M,d) — (M',d’) be an isometry. Then 

1) fis injective 

2) f is continuous 

3) f=: f(M) = M is also an isometry and hence also continuous. 
Proof. To prove 1), we observe that 


f(a) = fly) © d'F (2), fy) =0 S dz, y) =0er=y 
To prove 2), let (zn) — x in M. Then 
d'(f (an), f(£)) = d(an, £) > 0 as n > 00 


and so (f(2n)) — f(x), which proves that f is continuous. Finally, we have 
d(f~"(F(x)), FEU) = d(x, y) = 4'(F (2), FY) 

and so f~t: f(M) — M is an isometry. O 

The Completion of a Metric Space 


While not all metric spaces are complete, any metric space can be embedded in 
a complete metric space. To be more specific, we have the following important 
theorem. 


Theorem 12.8 Let (M,d) be any metric space. Then there is a complete metric 
space (M',d') and an isometry T: M — tM C M' for which TM is dense in 
M'. The metric space (M',d') is called a completion of (M,d). Moreover, 
(M', d') is unique, up to bijective isometry. 

Proof. The proof is a bit lengthy, so we divide it into various parts. We can 
simplify the notation considerably by thinking of sequences (x,,) in M as 
functions f: N —> M, where f(n) = ap. 


Cauchy Sequences in M 


The basic idea is to let the elements of M’ be equivalence classes of Cauchy 
sequences in M. So let CS(M) denote the set of all Cauchy sequences in M. If 
f.g € CS(M), then, intuitively speaking, the terms f(n) get closer together as 
n— oo and so do the terms g(n). Therefore, it seems reasonable that 
d( f(n), g(n)) should approach a finite limit as n — oo. Indeed, since 


ld(f (n), g(n)) — d(f (m), g(m))| < d(f(n), F(m)) + d(g(n), g(m)) — 0 


as n,m — oo it follows that d(f(n),g(n)) is a Cauchy sequence of real 
numbers, which implies that 


Metric Spaces 317 


lim d(f(n), g(m)) < co (12.4) 


(That is, the limit exists and is finite.) 

Equivalence Classes of Cauchy Sequences in M 

We would like to define a metric d’ on the set CS(M) by 
d'(f,g) = tim d(f(n), 9(n)) 

However, it is possible that 


lim d(f(n), g(n)) = 0 


n—-0o 


for distinct sequences f and g, so this does not define a metric. Thus, we are led 
to define an equivalence relation on CS(M) by 


f ~g% limd(f(n),9(n)) =0 


Let CS(M) be the set of all equivalence classes of Cauchy sequences and 
define, for f, g € CS(M), 


d'(f,9) = lim d(f(n), g(n)) (12.5) 


where f € f andg € 9. 
To see that d’ is well-defined, suppose that f'€ f and g' €g. Then since 
f' ~ f and g! ~ g, we have 
ld(f"(n), g'(n)) — d(f(n), 9(r))| < a(F'(n), F(n)) + d(g'(n), g(n)) > 0 
as n — oo. Thus, 
f ~ fandg ~ 9 = lim d(f'(n),9!(n)) = lim d(F(n), 9(n)) 
= d'(f,9)=a(f,9) 


which shows that d’ is well-defined. To see that d’ is a metric, we verify the 
triangle inequality, leaving the rest to the reader. If f,g and h are Cauchy 
sequences, then 


d(f(n), g(n)) < d(f(n), h(n)) + d(h(n), g(r) 
Taking limits gives 


lim d(f(n), 9(n)) < lim d(F(n), h(n)) + lim d( h(n), gf) 


N—-0o 


318 Advanced Linear Algebra 


and so 


d'(f,9) <d'(f,h) + d'(h,9) 


Embedding (M,d) in (M',d’) 


For each x € M, consider the constant Cauchy sequence [x], where [x](n) = x 
for all n. The map 7: M — M’ defined by 


Tx = [2] 


is an isometry, since 


d'(ra, ry) = d'([z], [y]) = lim d([a](n), [y](n)) = d(x, y) 


noo 


Moreover, TM is dense in M’. This follows from the fact that we can 
approximate any Cauchy sequence in M by a constant sequence. In particular, 
let f € M’. Since f € f is a Cauchy sequence, for any € > 0, there exists an N 
such that 


n,m > N > d(f(n), f(m)) < € 
Now, for the constant sequence [f (N)] we have 
d' (FOIF) = lim a( FV), f(n)) < € 
and so 7M is dense in M”. 
(M', d’) Is Complete 
Suppose that 


fi, fa, fa, ome 


is a Cauchy sequence in M’. We wish to find a Cauchy sequence g in M for 
which 


d' Fi) = lim d(fi(n), g(n)) > 0 as k > 00 
Since f, € M’ and since rTM is dense in M’, there is a constant sequence 
[ce] = (ces crs +++) 


for which 


Metric Spaces 319 


We can think of cx as a constant approximation to fx, with error at most 1/k. 
Let g be the sequence of these constant approximations: 


g(k) = Ck 


This is a Cauchy sequence in M. Intuitively speaking, since the f;,'s get closer 
to each other as k — oo, so do the constant approximations. In particular, we 
have 


Cl, fx) + 2 (Ses Fi) + @ (Fa ley) 
ee 
+ d'( fe, fi) + 7; 0 
as k, j — oo. To see that f;, converges to g, observe that 
= a = lo 
dF.) < d Fis (en) + d'd, 3) < 7 + lim dlen g(n)) 
= — + lim d(ck, cn) 
k n—o0 

Now, since g is a Cauchy sequence, for any e€ > 0, there is an N such that 

k,n =N => dlh) < € 
In particular, 

k > N => limd(cg, cn) <€ 
and so 


— 1 
k > N => d'(fr 9) < zre 
which implies that fy — g, as desired. 


Uniqueness 


Finally, we must show that if (1/’,d’) and (M”, d”) are both completions of 
(M,d), then M’ ~ M”. Note that we have bijective isometries 


7T:M >TM C Mando: M > oM c M" 
Hence, the map 
p=oT t: TM > 0M 


is a bijective isometry from TM onto oM, where 7M is dense in M’. (See 
Figure 12.4.) 


320 Advanced Linear Algebra 


Figure 12.4 


Our goal is to show that p can be extended to a bijective isometry p from M” to 
M". 


Let x € M’. Then there is a sequence (an) in 7M for which (an) —> x. Since 
(an) is a Cauchy sequence in TM, (p(an)) is a Cauchy sequence in oM C M” 
and since M” is complete, we have (p(an)) —> y for some y E€ M”. Let us 
define p(x) = y. 


To see that p is well-defined, suppose that (apn) — x and (bn) — x, where both 
sequences lie in 7M. Then 


d" (plan), p(bn)) = d' (an, bn) — 0 as n — 00 


and so (p(an)) and (p(bn)) converge to the same element of M”, which implies 
that p(x) does not depend on the choice of sequence in rTM converging to x. 
Thus, p is well-defined. Moreover, if a € 7M, then the constant sequence [a] 
converges to a and so p(a) = limp(a) = p(a), which shows that p is an 
extension of p. 


To see that p is an isometry, suppose that (ap) > x and (b,) — y. Then 
(p(an)) — p(x) and (p(bn)) > p(y) and since d” is continuous, we have 


d'(P(2),P()) = lim d” (plan), P(bn)) = lim d'(an, bn) = a(x, y) 


Thus, we need only show that p is surjective. Note first that 
oM =im(p) C im(p). Thus, if im(p) is closed, we can deduce from the fact 
that oM is dense in M” that im(p) = M”. So, suppose that (p(x„)) is a 
sequence in im(p) and (P(£n)) —> z. Then (p(£n)) is a Cauchy sequence and 
therefore so is (£n). Thus, (z,) ~ a €M’. But p is continuous and so 
(P(an)) > p(x), which implies that p(x) = z and so z € im(p). Hence, p is 
surjective and M’ ~ M".O 


Metric Spaces 321 


Exercises 
1. Prove the generalized triangle inequality 
d(£1, £n) < d(a1, £2) + d(£2, £3) +--+ + d(£n-1, Ln) 
2. a) Usethe triangle inequality to prove that 
|d(x, y) — d(a,b)| < d(z,a) + d(y, b) 
b) Prove that 
|d(a, 2) — d(y, z)| < d(x,y) 


3. Let S C £™ be the subspace of all binary sequences (sequences of 0's and 
1's). Describe the metric on S. 

4. Let M = {0,1}" be the set of all binary n-tuples. Define a function 
h: S x S — R by letting h(x, y) be the number of positions in which x and 
y differ. For example, h[(11010), (01001)] = 3. Prove that h is a metric. (It 
is called the Hamming distance function and plays an important role in 
the theory of error-correcting codes.) 

5. Letl<p<o. 
a) Ifa = (zn) E P show that zn — 0 
b) Find a sequence that converges to 0 but is not an element of any & for 

1<p<o. 

6. a) Show that ifa = (xn) E€ 2, then x € & forall q > p. 
b) Find a sequence x = (xn) that is in £? for p > 1, but is not in 4. 

7. Show that a subset S of a metric space M is open if and only if S contains 
an open neighborhood of each of its points. 

8. Show that the intersection of any collection of closed sets in a metric space 
is closed. 

9. Let (M,d) be a metric space. The diameter of a nonempty subset S C M 
is 

6(S) = sup d(x,y) 


T, yES 


A set S is bounded if 6(S) < ov. 
a) Prove that S is bounded if and only if there is some x € M and r € R 
for which S C B(x,r). 
b) Prove that 6(S) = 0 if and only if S consists of a single point. 
c) Prove that S C T implies (S) < 6(T). 
d) If S and T are bounded, show that S U T is also bounded. 
10. Let (M,d) be a metric space. Let d’ be the function defined by 


7 d(x,y) 
d (x,y) = 1+ d(a,y) 


322 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 
19. 
20. 
21. 


22. 


23. 
24. 


Advanced Linear Algebra 


a) Show that (M,d') is a metric space and that M is bounded under this 
metric, even if it is not bounded under the metric d. 

b) Show that the metric spaces (M,d) and (M,d’) have the same open 
sets. 

If S and T are subsets of a metric space (M,d), we define the distance 

between S and T by 

AS,T) = ae y) 

a) Is it true that po(S, T) = 0 if and only if S = T? Is p a metric? 

b) Show that x € cl(S) ifand only if p({x}, S) = 0. 

Prove that x€ M is a limit point of S C M if and only if every 

neighborhood of x meets S in a point other than «x itself. 

Prove that x € M is a limit point of S C M if and only if every open ball 

B(x,r) contains infinitely many points of S. 

Prove that limits are unique, that is, (£n) > £, (£n) — y implies that 

Sy: 

Let S' be a subset of a metric space M. Prove that x € cl(S) if and only if 

there exists a sequence (x,,) in S that converges to x. 

Prove that the closure has the following properties: 

a) S Ccl(S) 

b) cl(cl(S))=S 

c) el(SUT) =cl(S) Ucl(T) 

d) cl(SNT) Coel($)Ncl(T) 

Can the last part be strengthened to equality? 

a) Prove that the closed ball B(x,r) is always a closed subset. 

b) Find an example of a metric space in which the closure of an open ball 
B(x,r) is not equal to the closed ball B(x,r). 

Provide the details to show that R” is separable. 

Prove that C” is separable. 

Prove that a discrete metric space is separable if and only if it is countable. 

Prove that the metric space B[a, b] of all bounded functions on [a,b], with 

metric 


d(f,g) = sup |f(x) — g(z)| 


xe€{a,b] 


is not separable. 

Show that a function f: (M,d) — (M',d’) is continuous if and only if the 
inverse image of any open set is open, that is, if and only if 
f-l\(U) ={x € M | f(x) € U} is open in M whenever U is an open set 
in M’. 

Repeat the previous exercise, replacing the word open by the word closed. 
Give an example to show that if f:(M,d)— (M’,d’) is a continuous 
function and U is an open set in M, it need not be the case that f(U) is 
open in M”. 


29% 
26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


Metric Spaces 323 


Show that any convergent sequence is a Cauchy sequence. 
If (£n) — x in a metric space M, show that any subsequence (£n,) of (£n) 
also converges to x. 
Suppose that (x„) is a Cauchy sequence in a metric space M and that some 
subsequence (£n,) of (£n) converges. Prove that (£„) converges to the 
same limit as the subsequence. 
Prove that if (xn) is a Cauchy sequence, then the set {£n} is bounded. What 
about the converse? Is a bounded sequence necessarily a Cauchy sequence? 
Let (£n) and (yn) be Cauchy sequences in a metric space M. Prove that the 
sequence dn = d(an, Yn) converges. 
Show that the space of all convergent sequences of real numbers (or 
complex numbers) is complete as a subspace of °°. 
Let P denote the metric space of all polynomials over C, with metric 

d(p,q) = sup Ipte) — q(x)| 


xe{a,b 


Is P complete? 

Let S C £% be the subspace of all sequences with finite support (that is, 
with a finite number of nonzero terms). Is 5 complete? 

Prove that the metric space Z of all integers, with metric 
d(n,m) = |n — m|, is complete. 


. Show that the subspace S of the metric space C[a, b] (under the sup metric) 


consisting of all functions f € C[a, b] for which f(a) = f(b) is complete. 


. If M = M' and M is complete, show that M” is also complete. 
. Show that the metric spaces Cfa, b] and C[c, d], under the sup metric, are 


isometric. 


. Prove Holder's inequality 


G = LP / oo 1/q 
5 balz (>: aur) (>: mt 
n=1 n=1 n=1 


as follows: 

a) Show that s = t~! => t = s1! 

b) Let u and v be positive real numbers and consider the rectangle R in 
R? with corners (0,0), (u,0), (0,v) and (u,v), with area uv. Argue 
geometrically (that is, draw a picture) to show that 


u Uv 
ws f eats f slds 
0 0 


and so 


c) Now let X = Ejæ,|? < co and Y = Xļyn| < co. Apply the results of 
part b) to 


324 Advanced Linear Algebra 


= [£n] y= [Yn] 
X1/p’ Y 1/4 


and then sum on n to deduce Hölder's inequality. 
38. Prove Minkowski's inequality 


= 1/p 5 1/p a 1/p 
(Sp. + m) < B anr) + (>: i) 
n=1 n=1 n=1 


as follows: 
a) Prove it for p = 1 first. 
b) Assume p > 1. Show that 


|En + Ynl” < |2,||2n + Yn + [Yn] |En + Yn 


c) Sum this from n = 1 to k and apply Hölder's inequality to each sum on 
the right, to get 


k 1/p k 1/p k 1/q 
z i (Eer) a (Sr) | (Spe. T ml) 
n=1 n=1 n=l 


Divide both sides of this by the last factor on the right and let n — oo to 
deduce Minkowski's inequality. 
39. Prove that 4 is a metric space. 


Chapter 13 
Hilbert Spaces 


Now that we have the necessary background on the topological properties of 
metric spaces, we can resume our study of inner product spaces without 
qualification as to dimension. As in Chapter 9, we restrict attention to real and 
complex inner product spaces. Hence F will denote either R or C. 


A Brief Review 


Let us begin by reviewing some of the results from Chapter 9. Recall that an 
inner product space V over F is a vector space V, together with an inner 
product (,):V x V — F. If F = R, then the inner product is bilinear and if 
F = C, the inner product is sesquilinear. 


An inner product induces a norm on V, defined by 
loll = v (w, v) 
We recall in particular the following properties of the norm. 


Theorem 13.1 
1) (The Cauchy-Schwarz inequality) For all u,v € V, 


|(u, v)| < llull loll 


with equality if and only if u = rv for somer € F. 
2) (The triangle inequality) For all u,v € V, 


lu + vll < [lull + [lll 


with equality if and only if u = rv for somer € F. 
3) (The parallelogram law) 


lu + aff? + lju — vll? = 2llull? + 21o]? = 


We have seen that the inner product can be recovered from the norm, as follows. 


326 Advanced Linear Algebra 


Theorem 13.2 
1) IfV is areal inner product space, then 


1 
(u,v) = 7 (llu+ oll? — lju — vll’) 
2) IfV is a complex inner product space, then 
1 2 1, : zija 
(u,v) = 7 (llu +o? — llu — vll’) + qilllu + ivl? — |u — iol’) O 


The inner product also induces a metric on V defined by 
d(u, v) = |u — v|| 
Thus, any inner product space is a metric space. 
Definition Let V and W be inner product spaces and lett € L(V, W). 
1) 7 isan isometry if it preserves the inner product, that is, if 
(Tu, TV) = (u,v) 
forallu,v E€ V. 
2) A bijective isometry is called an isometric isomorphism. When T: V — W 


is an isometric isomorphism, we say that V and W are isometrically 
isomorphic. O 


It is easy to see that an isometry is always injective but need not be surjective, 
even if V = W. 


Theorem 13.3 A linear transformation T € L(V, W) is an isometry if and only 
if it preserves the norm, that is, if and only if 

lirull = lol 
forallv € VU 


The following result points out one of the main differences between real and 
complex inner product spaces. 


Theorem 13.4 Let V be an inner product space and let r € L(V). 

1) fitv, w) = 0for all v, w € V, then T = 0. 

2) If V is a complex inner product space and Q,(v) = (Tv, v) = 0 for all 
v E V, thent = 0. 

3) Part 2) does not hold in general for real inner product spaces. O 


Hilbert Spaces 


Since an inner product space is a metric space, all that we learned about metric 
spaces applies to inner product spaces. In particular, if (a,,) is a sequence of 


Hilbert Spaces 327 


vectors in an inner product space V, then 


(£n) > z if and only if ||z,, — a|| — 0 as n — œo 


The fact that the inner product is continuous as a function of either of its 
coordinates is extremely useful. 


Theorem 13.5 Let V be an inner product space. Then 


1) (an) > T£, (Yn) > Y = (En, Yn) > (2, Y) 
2) (@n) > 2 = (lanl > Mell O 


Complete inner product spaces play an especially important role in both theory 
and practice. 


Definition An inner product space that is complete under the metric induced by 
the inner product is said to be a Hilbert space. O 


Example 13.1 One of the most important examples of a Hilbert space is the 
space ¢?. Recall that the inner product is defined by 


oo 
(x, y) = X in 
n=1 


(In the real case, the conjugate is unnecessary.) The metric induced by this inner 
product is 


= 1/2 
d(x, y) = æ _ ylls = (Soi = mf) 


n=1 


which agrees with the definition of the metric space 4 given in Chapter 12. In 
other words, the metric in Chapter 12 is induced by this inner product. As we 
saw in Chapter 12, this inner product space is complete and so it is a Hilbert 
space. (In fact, it is the prototype of all Hilbert spaces, introduced by David 
Hilbert in 1912, even before the axiomatic definition of Hilbert space was given 
by John von Neumann in 1927.) 


The previous example raises the question whether the other metric spaces (? 
(p # 2), with distance given by 


wó 1/p 
d(x,y) = |æ — yll, = (>: e-n) (13.1) 
n=1 


are complete inner product spaces. The fact is that they are not even inner 
product spaces! More specifically, there is no inner product whose induced 
metric is given by (13.1). To see this, observe that, according to Theorem 13.1, 


328 Advanced Linear Algebra 


any norm that comes from an inner product must satisfy the parallelogram law 
2 2 2 2 
læ + yl + læ- yl” = 2llall” + 2ilyll 


But the norm in (13.1) does not satisfy this law. To see this, take 
x = (1,1,0,...) and y = (1, -1,0,...). Then 


læ + yll, = 2, lle- yl], = 2 
and 
læll, = 2”, lvl, = 2” 


Thus, the left side of the parallelogram law is 8 and the right side is 4 - 2?/?, 
which equals 8 if and only if p = 2. 


Just as any metric space has a completion, so does any inner product space. 


Theorem 13.6 Let V be an inner product space. Then there exists a Hilbert 
space H and an isometry t:V — H for which TV is dense in H. Moreover, H 
is unique up to isometric isomorphism. 

Proof. We know that the metric space (V,d), where d is induced by the inner 
product, has a unique completion (V’, d’), which consists of equivalence classes 
of Cauchy sequences in V. If (£n) € (£n) € V’ and (yn) € (Yn) € V’, then we 
set 


(£n) + (Yn) = (Zn a Yn), r( Zp) = (PEs) 


and 


((2n),(Yn)) = lim (£n, Yn) 


N—-O0o 


It is easy to see that since (£n) and (yn) are Cauchy sequences, so are (£n + Yn) 
and (rz,). In addition, these definitions are well-defined, that is, they are 
independent of the choice of representative from each equivalence class. For 
instance, if (2n) € (£n), then 


lim ||£n — &p|| = 0 


n—co 
and so 
|n Yn) = (Èn, Yn) | = (Ln = n, Yn) | < [£n = Pa llllYn |l = 0 


(The Cauchy sequence (yn) is bounded.) Hence, 


((2n);(Yn)) = lim (an, Yn) = jim (fn, Yn) = (Ln), (Yn)) 


noo 


We leave it to the reader to show that V’ is an inner product space under these 
operations. 


Hilbert Spaces 329 


Moreover, the inner product on V’ induces the metric d’, since 


(Tr = Un) (Tr E Yn)) = lim (En — Yn, n — Yn) 


lim d(£n, Yn)” 
= d ((n), (Yn) 


Hence, the metric space isometry 7: V — V’ is an isometry of inner product 
spaces, since 


(T£, TY) = d' (Tx, TY)? = d(x,y)? _ (x,y) 


Thus, V’ is a complete inner product space and TV is a dense subspace of V’ 
that is isometrically isomorphic to V. We leave the issue of uniqueness to the 
reader. O 


The next result concerns subspaces of inner product spaces. 


Theorem 13.7 

1) Any complete subspace of an inner product space is closed. 

2) A subspace of a Hilbert space is a Hilbert space if and only if it is closed. 

3) Any finite-dimensional subspace of an inner product space is closed and 
complete. 

Proof. Parts 1) and 2) follow from Theorem 12.6. Let us prove that a finite- 

dimensional subspace S' of an inner product space V is closed. Suppose that 

(£n) is a sequence in S, (£n) +a and x ¢ S. Let B= {by,...,bm} be an 

orthonormal Hamel basis for S. The Fourier expansion 


m 


s= Xis, b;)b; 


i=l 
in S has the property that x — s # 0 but 
(x E s, bj) = (x, bj) g (s, bj) =0 


Thus, if we write y = x — s and y, = £n — s € S, the sequence (yn), which is 
in S, converges to a vector y that is orthogonal to S. But this is impossible, 
because yn L y implies that 


2 2 2 2 
lyn = yll = lull + lult > lult #0 


This proves that S' is closed. 


To see that any finite-dimensional subspace S of an inner product space is 
complete, let us embed S (as an inner product space in its own right) in its 
completion S”. Then S (or rather an isometric copy of S) is a finite-dimensional 


330 Advanced Linear Algebra 


subspace of a complete inner product space S’ and as such it is closed. 
However, S is dense in S” and so S' = S”, which shows that S is complete. O 


Infinite Series 
Since an inner product space allows both addition of vectors and convergence of 
sequences, we can define the concept of infinite sums, or infinite series. 
Definition Let V be an inner product space. The nth partial sum of the 
sequence (xp) in V is 

Sn = L1 +++ Tn 
If the sequence (s,,) of partial sums converges to a vector s € V, that is, if 


[sn — s|| — 0 as n — oo 


then we say that the series $` x, converges to s and write 


We can also define absolute convergence. 


Definition 4 series X` xp is said to be absolutely convergent if the series 


SI 


n=1 


converges. O 


The key relationship between convergence and absolute convergence is given in 
the next theorem. Note that completeness is required to guarantee that absolute 
convergence implies convergence. 


Theorem 13.8 Let V be an inner product space. Then V is complete if and only 
if absolute convergence of a series implies convergence. 

Proof. Suppose that V is complete and that X` ||xz+|| < oo. Then the sequence sn 
of partial sums is a Cauchy sequence, for if n > m, we have 


< SS llzall 0 


k=m+1 


n 


Yn 


k=m+1 


Ilsn a Smll = 


Hence, the sequence (sn) converges, that is, the series X` x;, converges. 


Conversely, suppose that absolute convergence implies convergence and let 
(£n) be a Cauchy sequence in V. We wish to show that this sequence 
converges. Since (xn) is a Cauchy sequence, for each k > 0, there exists an Nj, 


Hilbert Spaces 


with the property that 
Se 1 
iG 2 Ne llei = all < = 
Clearly, we can choose N; < Na < ---, in which case 
1 
[ENa = zyl < 9k 


and so 


Solem- enll < Doge < 
k=1 


Thus, according to hypothesis, the series 


converges. But this is a telescoping series, whose nth partial sum is 


TN — UN 


331 


and so the subsequence (x,) converges. Since any Cauchy sequence that has a 
convergent subsequence must itself converge, the sequence (2;,) converges and 


so V is complete.O 


An Approximation Problem 


Suppose that V is an inner product space and that S is a subset of V. It is of 
considerable interest to be able to find, for any æ € V, a vector in S that is 
closest to x in the metric induced by the inner product, should such a vector 


exist. This is the approximation problem for V. 
Suppose that x € V and let 
6 = inf||x — sl 
Then there is a sequence sn for which 
bn = |l — Sn|| > 6 


as shown in Figure 13.1. 


332 Advanced Linear Algebra 


V 


Figure 13.1 
Let us see what we can learn about this sequence. First, if we let y, = £ — Skp, 
then according to the parallelogram law, 
2 2 2 2 
lyr + yall” + Wye = yl” = 2 yell” + lule 


or 


2 
Yk t Yj 


2 


(13.2) 


lyr — yall? = (lyell? + llel?) — | 


Now, if the set S is convex, that is, if 


rye Ssra+(1—-r)jye S$ foral0<r<l 


(in words, S contains the line segment between any two of its points), then 
(sk + s;)/2 € S and so 


2 2 2 
lyr = yll? < 2(Ilyall” + llygll") — 46° > 0 


Yk + Yj 


> 
5 6 


—||g TSi 
T 2 


Thus, (13.2) gives 


as k, 7 — co. Hence, if S is convex, then the sequence (yn) = (x — sn) is a 
Cauchy sequence and therefore so is (sn). 


If we also require that S be complete, then the Cauchy sequence (s,,) converges 
to a vector  € S and by the continuity of the norm, we must have ||x — @|| = 6. 
Let us summarize and add a remark about uniqueness. 


Theorem 13.9 Let V be an inner product space and let S be a complete convex 
subset of V. Then for any x € V, there exists a unique È € S for which 


|z — &|| = inf||x — s|| 
ses 


The vector & is called the best approximation fo x in S. 


Hilbert Spaces 333 


Proof. Only the uniqueness remains to be established. Suppose that 
lz- || = 6 = ||x — 2" || 


Then, by the parallelogram law, 


ee 2 A) 12 
I- x'l? = (œ - z’) - (@-@)| 
= lla — È|? + 2llz — r'|? — |22 -è —a' |’ 
p E+a'|l’ 
= 2x — 8? + 2e — all’ — 4]a - — 


< 26? + 26? — 467 =0 


and so @ = v’.O 


Since any subspace S of an inner product space V is convex, Theorem 13.9 
applies to complete subspaces. However, in this case, we can say more. 


Theorem 13.10 Let V be an inner product space and let S be a complete 
subspace of V. Then for any x € V, the best approximation to x in S is the 
unique vector x' € S for which x — x' L S. 
Proof. Suppose that x — x2’ L S, where x’ € S. Then for any s € S, we have 
z— zx’ L s—z'andso 

2 2 2 2 

lz — s||" = læ — x’? + liz- sll" > lle- 2"|| 

Hence x’ = Ẹ is the best approximation to x in S. Now we need only show that 
x — ? L S, where Z is the best approximation to x in S. For any s € S, a little 
computation reminiscent of completing the square gives 


|z — rs||? = (x —rs,a — rs) 


= |lz||’ = T(x, s) —r(s, x) + rP|ls||° 


=j 2 3 (r rail Ea) 
Iisl| Ils|| 


siia (: 23) (+ ea) Iz, )P 
Isl? Is?) Isl? 


(e,s)} _ Ies)? 
Ils? 


2 
Isl 


Ka 


D 


2 


Now, this is smallest when 


(x, 8) 


2 
lisli 


r = ro = 


334 Advanced Linear Algebra 


in which case 


(a, s) 


2 2 
læ = rosi" = lle" - ~—5 
lisli 


Replacing x by x — & gives 


a e\I2 
7 Sas L—@,s8 
lz- & — ros? = |le — 2|? L I TE l 
S 


But Z is the best approximation to x in S and since % — ros € S we must have 
5 2 Ai 
læ — & — ros||" > |z — 2| 
Hence, 


Is-28) 


sil” 


=0 


or equivalently, 


Hence, z — ? 1 S.O 
According to Theorem 13.9, if S is a complete subspace of an inner product 
space V, then for any x € V, we may write 

£= +(x -2) 


where @ € S and x — ẹ € S+. Hence, V = S + S+ and since $M S+ = {0}, 
we also have V = S © SŁ. This is the projection theorem for arbitrary inner 
product spaces. 


Theorem 13.11 (The projection theorem) /f S is a complete subspace of an 
inner product space V, then 


V=8506 
In particular, if S is a closed subspace of a Hilbert space H, then 
H=S0S~ oO 


Theorem 13.12 Let S, T and T' be subspaces of an inner product space V. 

1) IfV=SOT thenT = S+. 

2) FSOT=SO0T' then T =T. 

Proof. If V = S ©T, then T C S+ by definition of orthogonal direct sum. On 
the other hand, if z € S+, then z = s + t, for some s € S and t € T. Hence, 


0 = (z,s) = (s,s) + (t, s) = (s,s) 


Hilbert Spaces 335 


and so s = 0, implying that z = t € T. Thus, S+ C T. Part 2) follows from part 
1).0 


Let us denote the closure of the span of a set S of vectors by cspan( S). 


Theorem 13.13 Let H be a Hilbert space. 
1) IfA isa subset of H, then 


cspan( A) = A++ 
2) IfS is a subspace of H, then 
cl(S) = SH 
3) If K is a closed subspace of H, then 
K=KH 
Proof. We leave it as an exercise to show that [cspan(A)|+ = A+. Hence 
H = cspan( A) © [cspan(A)]* = cspan(A) © A+ 

But since A+ is closed, we also have 

H = At oat 
and so by Theorem 13.12, cspan( A) = A+. The rest follows easily from part 
1).0 


In the exercises, we provide an example of a closed subspace K of an inner 
product space V for which K 4 K++. Hence, we cannot drop the requirement 
that H be a Hilbert space in Theorem 13.13. 


Corollary 13.14 Jf A is a subset of a Hilbert space H, then span(A) is dense in 
H if and only if A+ = {0}. 
Proof. As in the previous proof, 


H =cspan(A) © A+ 
and so A+ = {0} if and only if H = cspan(A).0 
Hilbert Bases 
We recall the following definition from Chapter 9. 


Definition A maximal orthonormal set in a Hilbert space H is called a Hilbert 
basis for H.O 


Zorn's lemma can be used to show that any nontrivial Hilbert space has a Hilbert 
basis. Again, we should mention that the concepts of Hilbert basis and Hamel 
basis (a maximal linearly independent set) are quite different. We will show 


336 Advanced Linear Algebra 


later in this chapter that any two Hilbert bases for a Hilbert space have the same 
cardinality. 


Since an orthonormal set O is maximal if and only if O = {0}, Corollary 
13.14 gives the following characterization of Hilbert bases. 


Theorem 13.15 Let O be an orthonormal subset of a Hilbert space H. The 
following are equivalent: 

1) © is a Hilbert basis 

2) O-={0} 

3) © isa total subset of H, that is, cspan(O) = H.O 


Part 3) of this theorem says that a subset of a Hilbert space is a Hilbert basis if 
and only if it is a total orthonormal set. 


Fourier Expansions 


We now want to take a closer look at best approximations. Our goal is to find an 
explicit expression for the best approximation to any vector x from within a 
closed subspace S of a Hilbert space H. We will find it convenient to consider 
three cases, depending on whether S has finite, countably infinite, or 
uncountable dimension. 


The Finite-Dimensional Case 


Suppose that O = {u1,..., Un} is an orthonormal set in a Hilbert space H. 
Recall that the Fourier expansion of any x € H, with respect to O, is given by 


n 


T= S (x, ue )un 


k=1 
where (x, ug) is the Fourier coefficient of x with respect to uz. Observe that 
(x =e, Uk) = (£, Up) = (2, Ux) =0 


and so x — 1 span(O). Thus, according to Theorem 13.9, the Fourier 
expansion Ẹ is the best approximation to x in span(©). Moreover, since 
x — T L @, we have 


aj? 2 a2 2 
ZI" = lell? — lla — ZIP < |all 
and so 
Z|] < |x| 


with equality if and only if x = @, which happens if and only if x € span(Q). 
Let us summarize. 


Hilbert Spaces 337 


Theorem 13.16 Let O = {w1,..., Un} be a finite orthonormal set in a Hilbert 
space H. For any x€ H, the Fourier expansion & of x is the best 
approximation to x in span(Q). We also have Bessel's inequality 


|@l| < [lal 


or equivalently, 
Sola, ua)? < llel? (13.3) 


with equality if and only if x € span(Q).O 
The Countably Infinite-Dimensional Case 
In the countably infinite case, we will be dealing with infinite sums and so 


questions of convergence will arise. Thus, we begin with the following. 


Theorem 13.17 Let O = {uy, u2, ... } be a countably infinite orthonormal set in 
a Hilbert space H. The series 


X rrug (13 A) 


converges in H if and only if the series 


(e 6] 


Alr (13.5) 


k=1 


converges in R. If these series converge, then they converge unconditionally 
(that is, any series formed by rearranging the order of the terms also 
converges). Finally, if the series (13.4) converges, then 


2 

A 

= 2 hl 
k= 


Proof. Denote the partial sums of the first series by s,, and the partial sums of 
the second series by p,. Then form < n 


n 
y TkUk 


k=m+1 


oo 


J TkUk 


k=1 


2 


n 
_ Ps Irel? = Pn = Pml 


k=m+1 


||Sn — Sm? = 


Hence (s,,) is a Cauchy sequence in H if and only if (pn) is a Cauchy sequence 
in R. Since both H and R are complete, (sn) converges if and only if (pn) 
converges. 


If the series (13.5) converges, then it converges absolutely and hence 
unconditionally. (A real series converges unconditionally if and only if it 


338 Advanced Linear Algebra 


converges absolutely.) But if (13.5) converges unconditionally, then so does 
(13.4). The last part of the theorem follows from the continuity of the norm.O 


Now let O = {u1, us,... } be a countably infinite orthonormal set in H. The 
Fourier expansion of a vector x € H is defined to be the sum 


T= > (tuna (13.6) 
k=1 


To see that this sum converges, observe that for any n > 0, (13.3) gives 
n 
2 2 
Siw, up)? < Izl 
k=1 


and so 


co 


dil a, we)? < lel? 


k=1 


which shows that the series on the left converges. Hence, according to Theorem 
13.17, the Fourier expansion (13.6) converges unconditionally. 


Moreover, since the inner product is continuous, 
(x T T, Uk) = (£, Ug) a (2, Uk) =0 


and so x — % € [span(O)|+ = [cspan(O)}+. Hence, @ is the best approximation 
to x in cspan(Q). Finally, since x — % L Ẹ, we again have 


Po 2 a 2 
ZI" = [ælt — lle- ZI" < || 
and so 
Z| < lla 


with equality if and only if x = %, which happens if and only if x € cspan(). 
Thus, the following analog of Theorem 13.16 holds. 


Theorem 13.18 Let O = {u;, u2,... } be a countably infinite orthonormal set in 
a Hilbert space H. For any x € H, the Fourier expansion 


[0.0] 
= Xiz, Uk) Uk 


of x converges unconditionally and is the best approximation to x in cspan(Q). 
We also have Bessel's inequality 


|@l| < [lal 


Hilbert Spaces 339 


or equivalently, 
Xie, wl < llel? 


with equality if and only if z € cspan(O).0 
The Arbitrary Case 


To discuss the case of an arbitrary orthonormal set O = {up | k € K}, let us 
first define and discuss the concept of the sum of an arbitrary number of terms. 
(This is a bit of a digression, since we could proceed without all of the coming 
details — but they are interesting. ) 


Definition Let K = {£+ | k € K} be an arbitrary family of vectors in an inner 
product space V. The sum 
dot 


kek 


is said to converge to a vector x € V and we write 


ie (13.7) 


kek 
if for any e€ > 0, there exists a finite set S C K for which 


y Ly T 


keT 


T D S, T finite > <e O 


For those readers familiar with the language of convergence of nets, the set 
Po(K) of all finite subsets of K is a directed set under inclusion (for every 
A, B € Po(K) there is a C € Po(K) containing A and B) and the function 


S> Y r 


kes 


is a net in H. Convergence of (13.7) is convergence of this net. In any case, we 
will refer to the preceding definition as the net definition of convergence. 


It is not hard to verify the following basic properties of net convergence for 
arbitrary sums. 


Theorem 13.19 Let K = {xp | k € K} be an arbitrary family of vectors in an 


inner product space V. If 
Xa = xv and Sou =y 
kek kek 


then 


340 Advanced Linear Algebra 


1) (Linearity) 


` (TEk + SYk) = r£ + sy 
kek 


foranyr,s€ F 
2) (Continuity) 


Solem) = (2,9) and Sy, 24) = (y, 2) o 


kek kek 
The next result gives a useful “Cauchy-type” description of convergence. 
Theorem 13.20 Let K = {xp | k € K} be an arbitrary family of vectors in an 


inner product space V. 
1) Ifthe sum 


rn 


kek 
converges, then for any e > 0, there exists a finite set I C K such that 


Ta 


keJ 


JNI=0O, J finite > <e 


2) IfV is a Hilbert space, then the converse of 1) also holds. 
Proof. For part 1), given € > 0, let S C K, S finite, be such that 


y Lk & 


keT 


TDS, T finite > < 


€ 
2 


If JAS =, J finite, then 


[5+ = [Z+ ga- (Xr: — 2) 
J J S S 


Tans 
As for part 2), for each n > 0, let In C K be a finite set for which 


JUS 
1 
27 =— 


ged 


< + 


Ys -T L p ee 
s" Sga 


JOIn =9, J finite > 


and let 


i= yo 


kel, 


Hilbert Spaces 341 


Then (yn) is a Cauchy sequence, since 


y z,- > Tk 
In Im 
y Tk y Tk 
I„—I, In-f, 


n—+m m` tr 


llyn — Ymll 


Da- Da 
I„—I, Im—1, 


n—+m m—-+n 


IA 


1 1 
+ <—+->>0 
m n 


Since V is assumed complete, we have (yn) > y. 


Now, given e > 0, there exists an N such that 


€ 
n > N > |lyn—yll = 2y <3 
Setting n = max{ N, 2/e} gives for T D In, T finite, 
Sav =[Ou-vt Da 
T Tn T-I, 
e 1 
< Tk — I| <> 
Som w+] Sat, Se 


and so ” peg £r converges to y.O 


The following theorem tells us that convergence of an arbitrary sum implies that 
only countably many terms can be nonzero so, in some sense, there is no such 
thing as a nontrivial uncountable sum. 


Theorem 13.21 Let K = {x, | k € K} be an arbitrary family of vectors in an 
inner product space V. If the sum 


Tk 
kek 


converges, then at most a countable number of terms xp can be nonzero. 
Proof. According to Theorem 13.20, for each n > 0, we can let In C K, In 
finite, be such that 


JAI = ģ, J finite > < 


Sle 


D8 


jed 


Let I = U,,Jn. Then I is countable and 


1 
kéI=>{k}O1, =9 forall n = ||x,|| < — for all n > x, = 0 E 
n 


342 Advanced Linear Algebra 


Here is the analog of Theorem 13.17. 


Theorem 13.22 Let O = {ux | k € K} be an arbitrary orthonormal family of 
vectors in a Hilbert space H. The two series 


Sore and Sorel? 


kek kek 


converge or diverge together. If these series converge, then 


2 
2 
5 rhup|| = Xir 


kek kek 


Proof. The first series converges if and only if for every e > 0, there exists a 


finite set J C K such that 
2 
Jni=@, J finite > <e 


y TkUk 


keJ 


or equivalently, 


INI=0, J finite > X jra? < è 
keJ 


and this is precisely what it means for the second series to converge. We leave 
proof of the remaining statement to the reader. O 


The following is a useful characterization of arbitrary sums of nonnegative real 
terms. 


Theorem 13.23 Let {ry | k € K} be a collection of nonnegative real numbers. 
Then 


Sore = sup $ rr (13.8) 


> J finite 
kek yok KEJ 


provided that either of the preceding expressions is finite. 
Proof. Suppose that 


sup X r= R< œ 
J finite 


yok KEJ 
Then, for any € > 0, there exists a finite set S C K such that 


R> r> R-e 
kes 


Hilbert Spaces 343 


Hence, if T C K is a finite set for which T D S, then since rọ > 0, 


R> X >X me >R-€ 


keT keS 


and so 


R-X r; 


keT 


<e 


which shows that X` r} converges to R. Finally, if the sum on the left of (13.8) 
converges, then the supremum on the right is finite and so (13.8) holds.O 


The reader may have noticed that we have two definitions of convergence for 
countably infinite series: the net version and the traditional version involving 
the limit of partial sums. Let us write 


5 £g and D Lh; 
=1 


keNt k 


for the net version and the partial sum version, respectively. Here is the 
relationship between these two definitions. 


Theorem 13.24 Let H be a Hilbert space. If x, € H, then the following are 
equivalent: 


1) X wx converges (net version) to x 
keN+ 
o0 

2) X` £p converges unconditionally to x 
k=1 


Proof. Assume that 1) holds. Suppose that 7 is any permutation of N*. Given 
any € > 0, there is a finite set 9 C N+ for which 


T D S, T finite > Xor- <e 
keT 
Let us denote the set of integers {1,...,n} by J,, and choose a positive integer n 


such that 7(J,,) D S. Then for m > n we have 


m 


es Tr(k) — & 


k=1 


tIm) DtIn) DS > 


= 5 £k —T|| <€ 


ken(Im) 


and so 2) holds. 


344 Advanced Linear Algebra 


Next, assume that 2) holds, but that the series in 1) does not converge. Then 
there exists an € > 0 such that for any finite subset J C N+, there exists a finite 
subset J with J N I = Ø for which 


ra 


kel 


>E 


From this, we deduce the existence of a countably infinite sequence J, of 
mutually disjoint finite subsets of N+ with the property that 


max(Jn) = Mn < mMnayi = min(Jn41) 


and 


Se >€ 


kEJn 


Now we choose any permutation 7: N* — N* with the following properties 
1) a([mn,;Mn]) C [Mn, Mn] 


2) if In, = {jni ras sieis then 
™(Mn) — Inas T(Mn + 1) = Ins tee (Mn + Un — 1) = Friis 
The intention in property 2) is that for each n, m takes a set of consecutive 


integers to the integers in Jn. 


For any such permutation 7, we have 


Mn+Un—1 


Da 


keJn 


k=m, 


which shows that the sequence of partial sums of the series 


is not Cauchy and so this series does not converge. This contradicts 2) and 
shows that 2) implies at least that 1) converges. But if 1) converges to y € H, 
then since 1) implies 2) and since unconditional limits are unique, we have 
y = x. Hence, 2) implies 1).0 


Now we can return to the discussion of Fourier expansions. Let 
O = {ug | k € K} be an arbitrary orthonormal set in a Hilbert space H. Given 
any x € H, we may apply Theorem 13.16 to all finite subsets of O, to deduce 


Hilbert Spaces 345 


that 


sup X |(z, ux)? < llall? 


J finite 
jek KEJ 


and so Theorem 13.23 tells us that the sum 
2 
Ye, we)! 
kek 
converges. Hence, according to Theorem 13.22, the Fourier expansion 
T= ae Uk) Uk 
kek 
of x also converges and 
zij? 2 
2? = Sole, ux) 
kek 
Note that, according to Theorem 13.21, is a countably infinite sum of terms of 


the form (x, ux) ux and so is in cspan( O). 


The continuity of infinite sums with respect to the inner product (Theorem 
13.19) implies that 


A 


(x a È, Up) = (£, uk) _ (T uk) =0 


and so x — ĉ € [span(©)]+ = [cspan(©)]+. Hence, Theorem 3.9 tells us that È 
is the best approximation to x in cspan(Q). Finally, since x — @ L @, we again 
have 


Iaf = liel? — lz - 2? < el’ 
and so 
êl] < |l2| 
with equality if and only if x = %, which happens if and only if z € cspan(O). 


Thus, we arrive at the most general form of a key theorem about Hilbert spaces. 


Theorem 13.25 Let O = {uj | k € K} be an orthonormal family of vectors in 
a Hilbert space H. For any x € H, the Fourier expansion 


c= She, Uk) Uk 


kek 


of x converges in H and is the unique best approximation to x in cspan(Q). 
Moreover, we have Bessel's inequality 


|@l| < la 


346 Advanced Linear Algebra 


or equivalently, 


X l, ua)? < lel? 


kek 
with equality if and only if x € cspan(Q).0 
A Characterization of Hilbert Bases 


Recall from Theorem 13.15 that an orthonormal set O = {u, |k © K} in a 
Hilbert space H is a Hilbert basis if and only if 


cspan(O) = H 


Theorem 13.25, then leads to the following characterization of Hilbert bases. 


Theorem 13.26 Let O = {ux | k € K} be an orthonormal family in a Hilbert 
space H. The following are equivalent: 

I) © isa Hilbert basis (a maximal orthonormal set) 

2) OF =ip} 

3) © is total (that is, cspan(O) = H) 

4) x=?forallx € H 

5) Equality holds in Bessel's inequality for all x € H, that is, 


æl = |[@ll 


forallx € H 
6) Parseval's identity 


(vy) = (2,9) 
holds for all x,y € H, that is, 


(x, y) = X (s; UK) (Y, Uk) 


kek 


Proof. Parts 1), 2) and 3) are equivalent by Theorem 13.15. Part 4) implies part 
3), since  € cspan() and 3) implies 4) since the unique best approximation of 
any x € cspan(Q) is itself and so x = %. Parts 3) and 5) are equivalent by 
Theorem 13.25. Parseval's identity follows from part 4) using Theorem 13.19. 
Finally, Parseval's identity for y = implies that equality holds in Bessel's 
inequality. O 


Hilbert Dimension 


We now wish to show that all Hilbert bases for a Hilbert space H have the same 
cardinality and so we can define the Hilbert dimension of H to be that 
cardinality. 


Hilbert Spaces 347 


Theorem 13.27 All Hilbert bases for a Hilbert space H have the same 
cardinality. This cardinality is called the Hilbert dimension of H, which we 
denote by hdim( H). 

Proof. If H has a finite Hilbert basis, then that set is also a Hamel basis and so 
all finite Hilbert bases have size dim( H). Suppose next that B = {by | k € K} 
and C = {c; | j € J} are infinite Hilbert bases for H. Then for each b, we have 


by = So (be, ce; 


JET 


where J;, is the countable set {7 | (bx, cj} # 0}. Moreover, since no c; can be 
orthogonal to every b;, we have U, J, = J. Thus, since each Jẹ is countable, 
we have 


|J| = 


Us 


kek 


< Nol K| = |K| 


By symmetry, we also have |K| < |.J| and so the Schrdder—Bernstein theorem 
implies that | J| = |&|.0 


Theorem 13.28 Two Hilbert spaces are isometrically isomorphic if and only if 
they have the same Hilbert dimension. 
Proof. Suppose that hdim(H,) = hdim(H,). Let O, ={u,|k eK} be a 
Hilbert basis for Hı and O2 = {vp | k € K} a Hilbert basis for Hə. We may 
define a map 7: Hı — Hə as follows: 


T (Yrru) = X Oreo 


kek keK 
We leave it as an exercise to verify that 7 is a bijective isometry. The converse 
is also left as an exercise.O 


A Characterization of Hilbert Spaces 


We have seen that any vector space V is isomorphic to a vector space (F'?)o of 
all functions from B to F that have finite support. There is a corresponding 
result for Hilbert spaces. Let K be any nonempty set and let 


PIK) ={f:K > CE IE)? < oo} 
kek 


The functions in (K) are referred to as square summable functions. (We can 
also define a real version of this set by replacing C by R.) We define an inner 
product on Ê (K) by 


(f,9) = DoF (R)g() 


kek 


The proof that (K) is a Hilbert space is quite similar to the proof that 


348 Advanced Linear Algebra 


L = ((N) is a Hilbert space and the details are left to the reader. If we define 
by € Ê(K) by 


bei) = rj = { : a. 3 i 
then the collection 
O= {r| ke K} 
is a Hilbert basis for (K), of cardinality | K|. To see this, observe that 


(8i, 65) = DAONA) = 645 


kek 


and so © is orthonormal. Moreover, if f € (?(K), then f(k) #0 for only a 
countable number of k € K, say {k1, ko,...}. If we define f’ by 


F =SoF(b)dx 
¿4l 


then f’ € cspan(O) and f'(j) = f(j) for all j € K, which implies that f = f’. 
This shows that (K) = cspan(O) and so O is a total orthonormal set, that is, a 
Hilbert basis for (K). 


Now let H be a Hilbert space, with Hilbert basis B = {up | k € K}. We define 
a map ¢: H — (K) as follows. Since B is a Hilbert basis, any x € H has the 


form 
r= S > (x, ux) ux 


keK 
Since the series on the right converges, Theorem 13.22 implies that the series 
2 
Xile, ux) 
keK 


converges. Hence, another application of Theorem 13.22 implies that the 
following series converges: 


(2) = X (x, ux). 
kek 


It follows from Theorem 13.19 that ¢ is linear and it is not hard to see that it is 
also bijective. Notice that d(u,) = 6, and so ¢ takes the Hilbert basis B for H 
to the Hilbert basis O for (K). 


Hilbert Spaces 349 


Notice also that 
l(a)? = (2), O(@)) = Sow, ux)? = = |x|’ 


kek 


S (x, ux)ux 


kek 


and so ¢ is an isometric isomorphism. We have proved the following theorem. 
Theorem 13.29 /f H is a Hilbert space of Hilbert dimension « and if K is any 
set of cardinality k, then H is isometrically isomorphic to (K). 

The Riesz Representation Theorem 


We conclude our discussion of Hilbert spaces by discussing the Riesz 
representation theorem. As it happens, not all linear functionals on a Hilbert 
space have the form “take the inner product with...,” as in the finite- 
dimensional case. To see this, observe that if y € H, then the function 


fy(x) = (2,9) 


is certainly a linear functional on H. However, it has a special property. In 
particular, the Cauchy—Schwarz inequality gives, for all x € H, 


|fy(x)| = (x, 91 < [lal lull 
or, for all « Æ 0, 
| fy(z)| 


||| 


< lly 


Noticing that equality holds if x = y, we have 


MIZO] 


= llyll 
x40 [|x| 


This prompts us to make the following definition, which we do for linear 
transformations between Hilbert spaces (this covers the case of linear 
functionals). 


Definition Let r: Hı — Hə be a linear transformation from H; to Hə. Then T is 
said to be bounded if 


I|72| 


sup ——~ < co 
«£0 I||| 


If the supremum on the left is finite, we denote it by ||r|| and call it the norm of 
ro 


350 Advanced Linear Algebra 


Of course, if f: H — F is a bounded linear functional on H, then 


If || = sup ZI 
«#0 I|=| 


The set of all bounded linear functionals on a Hilbert space H is called the 
continuous dual space, or conjugate space, of H and denoted by H*. Note 
that this differs from the algebraic dual of H, which is the set of all linear 
functionals on H. In the finite-dimensional case, however, since all linear 
functionals are bounded (exercise), the two concepts agree. (Unfortunately, 
there is no universal agreement on the notation for the algebraic dual versus the 
continuous dual. Since we will discuss only the continuous dual in this section, 
no confusion should arise.) 


The following theorem gives some simple reformulations of the definition of 
norm. 


Theorem 13.30 Let 7: Hı — Hə be a bounded linear transformation. 


1) l= Me I|72"| 
«||=1 
2) |r| = sup ||r2"| 
HES 
3) |\7|| =inf{c € R | ||re]|| < e|lx|| Jorall x € H} oO 


The following theorem explains the importance of bounded linear 
transformations. 


Theorem 13.31 Let 7: Hı — Hə be a linear transformation. The following are 
equivalent: 

1) Tis bounded 

2) Tis continuous at any point xy) E€ H 

3) 7T is continuous. 

Proof. Suppose that 7 is bounded. Then 


rz = ral] = ||7(@ = zo)|| < II7IIlz = zol| > 0 


as x — xo. Hence, 7 is continuous at xo. Thus, 1) implies 2). If 2) holds, then 
for any y € H, we have 

rz — ryl|| = ||7(@ — y + zo) — T(£0)l| > 0 
as x — y, since T is continuous at zo and £x — y + £o — a as y —> x. Hence, T 


is continuous at any y € H and 3) holds. Finally, suppose that 3) holds. Thus, T 
is continuous at 0 and so there exists a 6 > 0 such that 


lal] <6 = ||r2|| < 1 


Hilbert Spaces 351 


In particular, 


and so 


Ire) 1, iral 1 
sa] 5 [al S S 


lel =1 = lêr] = 6 


Thus, 7 is bounded. O 
Now we can state and prove the Riesz representation theorem. 


Theorem 13.32 (The Riesz representation theorem) Let H be a Hilbert 
space. For any bounded linear functional f on H, there is a unique zọ € H 
such that 


f(x) = (x, 20) 


for all x € H. Moreover, ||z9|| = || f|- 
Proof. If f =0, we may take zọ = 0, so let us assume that f Æ 0. Hence, 
K =ker(f) # H and since f is continuous, K is closed. Thus 


H=KOK+ 


Now, the first isomorphism theorem, applied to the linear functional f: H — F, 
implies that H/K ~ F (as vector spaces). In addition, Theorem 3.5 implies that 
H/K = K+ andso K+ x F. In particular, dim( K+) = 1. 


For any z € K+, we have 
x E€ K > f(x) =0= (2, z) 
Since dim( K+) = 1, all we need do is find 0 4 z € K+ for which 
f(z) = (2,2) 


for then f(rz)=rf(z)=r(z,z) = (rz,z) for all re F, showing that 
f(x) = (a, z) for x € K as well. 


But if0 4 z € K+, then 


£2) 


aT z,2) ~ 


has this property, as can be easily checked. The fact that ||zo|| = ||f|| has 
already been established. O 


352 


Advanced Linear Algebra 


Exercises 


1. 


10. 


11. 


12. 


Prove that the sup metric on the metric space Cla,b] of continuous 
functions on fa, b] does not come from an inner product. Hint: let f(t) = 1 
and g(t) = (t — a)/(b — a) and consider the parallelogram law. 

Prove that any Cauchy sequence that has a convergent subsequence must 
itself converge. 

Let V be an inner product space and let A and B be subsets of V. Show 
that 

a) AC B=>BŁC A 

b) A+ isa closed subspace of V 

c) [espan(A)]> = A+ 

Let V be an inner product space and S C V. Under what conditions is 
gitt = S+? 

Prove that a subspace S of a Hilbert space H is closed if and only if 
S= g, 

Let V be the subspace of / consisting of all sequences of real numbers 
with the property that each sequence has only a finite number of nonzero 
terms. Thus, V is an inner product space. Let K be the subspace of V 
consisting of all sequences x = (xn) in V with the property that 
Ex, /n = 0. Show that K is closed, but that K++ Æ+ K. Hint: For the latter, 
show that K+ = {0} by considering the sequences u = (1,...,—n,...), 
where the term —n is in the nth coordinate position. 

Let O = {ui, u2, ... } be an orthonormal set in H. If x = Erpug converges, 
show that 


[0.6] 
2 2 
lel? = Xirs 
k=1 


Prove that if an infinite series 


Yn 


k=1 


converges absolutely in a Hilbert space H, then it also converges in the 
sense of the “net” definition given in this section. 

Let {rp | k E€ K} be a collection of nonnegative real numbers. If the sum 
on the left below converges, show that 


yore = sup 5 Tk 


kek ee hed 


Find a countably infinite sum of real numbers that converges in the sense of 
partial sums, but not in the sense of nets. 

Prove that if a Hilbert space H has infinite Hilbert dimension, then no 
Hilbert basis for H is a Hamel basis. 

Prove that ¢?(.K’) is a Hilbert space for any nonempty set K. 


13. 


14. 
15. 
16. 
17. 
18. 


19, 


Hilbert Spaces 353 


Prove that any linear transformation between finite-dimensional Hilbert 
spaces is bounded. 

Prove that if f € H*, then ker( f) is a closed subspace of H. 

Prove that a Hilbert space is separable if and only if hdim(H) < No. 

Can a Hilbert space have countably infinite Hamel dimension? 

What is the Hamel dimension of (N)? 

Let 7 and o be bounded linear operators on H. Verify the following: 

a) |irrl| = llr 

b) [Ir toll <[irll + llel 

©) |Iroll < [Frill 

Use the Riesz representation theorem to show that H* ~ H for any Hilbert 
space H. 


Chapter 14 
Tensor Products 


In the preceding chapters, we have seen several ways to construct new vector 
spaces from old ones. Two of the most important such constructions are the 
direct sum U @ V and the vector space £(U,V) of all linear transformations 
from U to V. In this chapter, we consider another very important construction, 
known as the tensor product. 


Universality 


We begin by describing a general type of universality that will help motivate the 
definition of tensor product. Our description is strongly related to the formal 
notion of a universal pair in category theory, but we will be somewhat less 
formal to avoid the need to formally define categorical concepts. Accordingly, 
the terminology that we shall introduce is not standard, but does not contradict 
any standard terminology. 


Referring to Figure 14.1, consider a set A and two functions f and g, with 
domain A. 


f 
A ——— Ss 
iT 
g i 
v 
X 
Figure 14.1 


Suppose that there exists a function 7:5 — X for which this diagram 
commutes, that is, 


g=Tof 


This is sometimes expressed by saying that g can be factored through f. What 
does this say about the relationship between the functions f and g? 


356 Advanced Linear Algebra 


Let us think of the “information” about A contained in a function h: A > B as 
the way in which h distinguishes elements of A using labels from B. The 
relationship above implies that 


g(a) # gb) = fla) # f(b) 


and this can be phrased by saying that whatever ability g has to distinguish 
elements of A is also possessed by f. Put another way, except for labeling 
differences, any information about A that is contained in g is also contained in 


f. 


If + happens to be injective, then the only difference between f and g is the 
values of the labels. That is, the two functions have the same information about 
A. However, in general, 7 is not required to be injective and so f may contain 
more information than g. 


Now consider a family S of sets and a family 
F={g:A>X|X eS} 


Assume that S € S and f: A — S € F. If the diagram in Figure 14.1 commutes 
for all g € F, then the information contained in every function in F is also 
contained in f. Moreover, since f € F, the function f cannot contain more 
information than is contained in the entire family and so we conclude that f 
contains exactly the same information as is contained in the entire family F. In 
this sense, f: A — S is universal among all functions g: A —> X in F. 


In this way, a single function f: A — S, or more precisely, a single pair (S, f), 
can capture a mathematical concept as described by a family of functions. Some 
examples from linear algebra are basis for a vector space, quotient space, direct 
sum and bilinearity (as we will see). 


Let us make a formal definition. 
Definition Referring to Figure 14.2, let A be a set and let S be a family of sets. 
Let 

F={g:A-~X|X eS} 


be a family of functions, all of which have domain A and range a member of S. 
Let 


H=4{r: X >Y|X,Y ES} 


be a family of functions with domain and range in S. We assume that H has the 
following structure: 
1) #H contains the identity function us for each member of S. 


Tensor Products 357 


2) H is closed under composition of functions, which is an associative 
operation. 

3) Foranyt € Hand f € F, the composition T o f is defined and belongs to 
F. 


Figure 14.2 


We refer to H as the measuring family and its members as measuring 
functions. 


A pair (S, f: A > S), where S € S and f € F has the universal property for 
the family F as measured by H, or is a universal pair for (F,H), if for every 
g: A —> X in F, there is a unique T: S — X in H for which the diagram in 
Figure 14.1 commutes, that is, for which 


g=Tof 


or equivalently, any g E€ F can be factored through f. The unique measuring 
function 7T is called the mediating morphism for g.O 


Note the requirement that the mediating morphism 7 be unique. Universal pairs 
are essentially unique, as the following describes. 


Theorem 14.1 Let (S, f:A— S) and (T,g: A —T) be universal pairs for 
(F,H). Then there is a bijective measuring function u € H for which pS =T. 
In fact, the mediating morphism of f with respect to g and the mediating 
morphism of g with respect to f are isomorphisms. 

Proof. With reference to Figure 14.3, there are mediating morphisms 7: 5 — T 
and o: T — S for which 


g=Tof 
f=o0g 


Hence, 


g=(Toa)og 
f=(cor)of 


However, referring to the third diagram in Figure 14.3, both ao 7: S — S and 
the identity map : S — S are mediating morphisms for f and so the uniqueness 


358 Advanced Linear Algebra 


of mediating morphisms implies that ø o 7 = ų. Similarly 7 o ø = ų and so 7 and 
g are inverses of one another, making 7 the desired bijection. O 


f g f 
A ——> 5 A —> T A ——> S 
IT 1o f | OT=1 
g j 
M M v 
T S S 
Figure 14.3 


Examples of Universality 


Now let us look at some examples of the universal property. Let Vect( F) denote 
the family of all vector spaces over the base field F. (We use the term family 
informally to represent what in set theory is formally referred to as a class. A 
class is a “collection” that is too large to be considered a set. For example, 
Vect(F) is a class.) 


Example 14.1 (Bases) Let 6 be a nonempty set and let 


1) S=Vect(F) 
2) F = set functions from 6 to members of F 
3) #H = linear transformations 


If Vg is a vector space with basis B, then the pair (Vg, j: B —> Vg), where j is 
the inclusion map jv = v, is universal for (F,H). To see this, note that the 
condition that g € F can be factored through j, 


g=Toj 


is equivalent to the statement that rv = gv for each basis vector v € B. But this 
uniquely defines a linear transformation T. 


In fact, the universality of the pair (Vg, 7) is precisely the statement that a linear 
transformation 7 is uniquely determined by assigning its values arbitrarily on a 
basis B, the function g doing the arbitrary assignment in this context. Note also 
that Theorem 14.1 implies that if (W, k: B — W) is also universal for (F, H), 
then there is a bijective mediating morphism from Vg to W, that is, W and Vg 
are isomorphic. O 


Example 14.2 (Quotient spaces and canonical projections) Let V be a vector 
space and let K be a subspace of V. Let 


1) S= Vect(F) 


Tensor Products 359 


2) F = linear maps with domain V, whose kernels contain K 
3) #H = linear transformations 


Theorem 3.4 says precisely that the pair (V/K, n: V — V/K), where v is the 
canonical projection map, has the universal property for F as measured by H.O 


Example 14.3 (Direct sums) Let U and V be vector spaces over F. Let 


1) S= Vect(F) 
2) F = ordered pairs (f: U — W, g: V — W) of linear transformations 
3) ‘HH = linear transformations 


Here we have a slight variation on the definition of universal pair: In this case, 
F is a family of pairs of functions. For r € H and (f,g) E€ F, we set 
ro(f,g)=(rof,7°g) 


Then the pair (U BV, (j1, j2): (U,V) —> U BV), where 


jiu = (u, 0) and Jav = (0, v) 


are called the canonical injections, has the universal property for (F,H). To 
see this, observe that for any pair (f,g):(U,V) — W in F, the condition 


(f,9) =7 0° (ji, j2) 
is equivalent to 
(f,9) = (T° ji, TO j2) 
or 


T(u,0) = f(u) and 7(0,v)= g(v) 


But these conditions define a unique linear transformation tr: U HV — W.O 


Thus, bases, quotient spaces and direct sums are all examples of universal pairs 
and it should be clear from these examples that the notion of universal property 
is, well, universal. In fact, it happens that the most useful definition of tensor 
product is through a universal property, which we now explore. 


Bilinear Maps 

The universality that defines tensor products rests on the notion of a bilinear 
map. 

Definition Let U, V and W be vector spaces over F. Let U x V be the 
cartesian product of U and V as sets. A set function 


fUxV>W 


360 Advanced Linear Algebra 


is bilinear if it is linear in both variables separately, that is, if 
f(rut+ su, v) =rf(u,v) + sf(u',v) 
and 
f(u,ru+ so’) =rf(u,v) + sf(u,v’) 
The set of all bilinear functions from U xV to W is denoted by 
homp(U,V;W). A bilinear function f:U x V — F with values in the base 
field F is called a bilinear form on U x V.O 
Note that bilinearity can also be expressed in matrix language as follows: If 
a= (ansan) EF", b=(b,...,0m) E R” 
and 
u = (t4,---,Un) EU”, v= (v1,...,Um) E V” 
then f:U x V — W is bilinear if 
flaut, bv’) = aFb' 
where F = [f (ui, vi)li z 
It is important to emphasize that, in the definition of bilinear function, U x V is 
the cartesian product of sets, not the direct product of vector spaces. In other 


words, we do not consider any algebraic structure on U x V when defining 
bilinear functions, so expressions like 


(x,y)+(z,w) and r(a,y) 


are meaningless. 


In fact, if V is a vector space, there are two classes of functions from V x V to 
W: the linear maps L(V x V,W), where V xV =V HV is the direct 
product of vector spaces, and the bilinear maps hom(V , V; W), where V x V is 
just the cartesian product of sets. We leave it as an exercise to show that these 
two classes have only the zero map in common. In other words, the only map 
that is both linear and bilinear is the zero map. 


We made a thorough study of bilinear forms on a finite-dimensional vector 
space V in Chapter 11 (although this material is not assumed here). However, 
bilinearity is far more important and far-reaching than its application to metric 
vector spaces, as the following examples show. Indeed, both multiplication and 
evaluation are bilinear. 


Example 14.4 (Multiplication is bilinear) If A is an algebra, the product map 
u: A x A — A defined by 


Tensor Products 361 


p(a, b) = ab 


is bilinear, that is, multiplication is linear in each position. O 


Example 14.5 (Evaluation is bilinear) If V and W are vector spaces, then the 
evaluation map ¢: L(V,W) x V — W defined by 


plf, v) = fv 


is bilinear. In particular, the evaluation map ¢:V* x V — F defined by 
ọ(f,v) = fv is a bilinear form on V* x V.O 


Example 14.6 If V and W are vector spaces, and f € V* and g € W’*, then the 
product map ¢:V x W — F defined by 


plv, w) = f(v)g(w) 


is bilinear. Dually, if ve V and w € W, then the map à: V* x W* => F 
defined by 


ACF, g) = F(v)g(w) 


is bilinear. O 


It is precisely the tensor product that will allow us to generalize the previous 
example. In particular, if r € L(U, W) and o € L(V, W), then we would like 
to consider a “product” map ¢:U x V — W defined by 


(u,v) = T(u) ? o(v) 


The tensor product ® is just the thing to replace the question mark, because it 
has the desired bilinearity property, as we will see. In fact, the tensor product is 
bilinear and nothing else, so it is exactly what we need! 


Tensor Products 


Let U and V be vector spaces. Our guide for the definition of the tensor product 
U & V will be the desire to have a universal property for bilinear functions, as 
measured by linearity. Referring to Figure 14.4, we want to define a vector 
space 7’ and a bilinear map t:U x V — T so that any bilinear map f with 
domain U x V can be factored through t. Intuitively speaking, t is the most 
“general” or “universal” bilinear map with domain U x V: It is bilinear and 
nothing more. 


362 Advanced Linear Algebra 


t bilinear 


UxV ST 


1 a 
1T linear 


v 
Ww 


f bilinear 


Figure 14.4 


Definition Let U x V be the cartesian product of two vector spaces over F. Let 
S = Vect( F). Let 


F= | J{hom,(U, V;W)|WeS} 
wW 


be the family of all bilinear maps from U x V to any vector space W. The 
measuring family H is the family of all linear transformations. 


A pair (T,t:U x V — T) is universal for bilinearity if it is universal for 
(F,H), that is, if for every bilinear map f:U x V — W, there is a unique 
linear transformation T: T — W for which 


f=rot 


The map T is called the mediating morphism for f.O 
We can now define the tensor product via this universal property. 


Definition Let U and V be vector spaces over a field F. Any universal pair 
(T,t:U x V — T) for bilinearity is called a tensor product of U and V. The 
vector space T is denoted by U & V and sometimes referred to by itself as the 
tensor product. The map t is called the tensor map and the elements of U © V 
are called tensors. 


It is customary to use the symbol & to denote the image of any ordered pair 
(u, v) under the tensor map, that is, 


u Qv = t(u, v) 


for an uEU and vEV. A tensor of the form u®v is said to be 
decomposable, that is, the decomposable tensors are the images under the 
tensor map.U 


Since universal pairs are unique up to isomorphism, we may refer to “the” 
tensor product of vector spaces. Note also that the tensor product ® is not a 
product in the sense of a binary operation on a set. In fact, even when V = U, 
the tensor product u ® u is not in U, but rather in U @ U. 


Tensor Products 363 


As we will see, there are other, more constructive ways to define the tensor 
product. Since we have adopted the universal pair definition, the other ways to 
define tensor product are, for us, constructions rather than definitions. Let us 
examine some of these constructions. 


Construction I: Intuitive but Not Coordinate Free 


The universal property for bilinearity captures the essence of bilinearity and the 
tensor map is the most “general” bilinear function on U x V. To see how this 
universality can be achieved in a constructive manner, let {e; | 7 € I} be a basis 
for U and let { f; | j € J} be a basis for V. Then a bilinear map t on U x V is 
uniquely determined by assigning arbitrary values to the “basis” pairs (e;, fj) 
and extending by bilinearity, that is, if u = $ a;e; and v = 974; fj, then 


t(u,v) = (> Qie;, ‘> Bii) = 5 aibitlei, fi) 


Now, the tensor map t, being the most general bilinear map, must do this and 
nothing more. To achieve this goal, we define the tensor map t on the pairs 
(ei, fj) in such a way that the images t(e;, fj) do not interact, and then extend 
by bilinearity. 


In particular, for each ordered pair (e;, f;), we invent a new formal symbol, 
written e; ® fj, and define T to be the vector space with basis 
D= {eQ fili EI, jeJ} 


The tensor map is defined by setting t(e;, fj) = ei Q f; and extending by 
bilinearity. Thus, 


t(u,v) =t(S> aie D Ah) =D aiBi(e: h) 


To see that the pair (T, t) is the tensor product of U and V, if g: U x V — W is 
bilinear, the universality condition g = 7 o t is equivalent to 


Thei ® fi) = glei, fi) 


which does indeed uniquely define a linear map r: T — W. Hence, (T, t) has 
the universal property for bilinearity and so we can write T = U & V and refer 
to t as the tensor map. 


Note that while the set D = {e; & fj} is a basis for T (by definition), the set 
{u@v|uceUu,veV} 


of decomposable tensors spans 7’, but is not linearly independent. This does 
cause some initial confusion during the learning process. For example, one 
cannot define a linear map on U & V by assigning values arbitrarily to the 
decomposable tensors, nor is it always easy to tell when a tensor X u; Q vj is 


364 Advanced Linear Algebra 


equal to 0. We will consider the latter issue in some detail a bit later in the 
chapter. 


The fact that D is a basis for U & V gives the following. 


Theorem 14.2 For finite-dimensional vector spaces U and V, 


dim(U & V) = dim(U) - dim(V) m 


Construction IT: Coordinate Free 


The previous construction of the tensor product is reasonably intuitive, but has 
the disadvantage of not being coordinate free. The following approach does not 
require the choice of a basis. 


Let Fyxy be the vector space over F with basis U x V. Let S be the subspace 
of Fyxy generated by all vectors of the form 

r(u,w) + s(v, w) — (ru + sv, w) (14.1) 
and 

r(u,v) + s(u, w) — (u, rv + sw) (14.2) 


where r,s € F and u,v and w are in the appropriate spaces. Note that these 
vectors are precisely what we must “identify” as the zero vector in order to 
enforce bilinearity. Put another way, these vectors are 0 if the ordered pairs are 
replaced by tensors according to our previous construction. 


Accordingly, the quotient space 


is also sometimes taken as the definition of the tensor product of U and V. 
(Strictly speaking, we should not be using the symbol U @ V until we have 
shown that this is the tensor product.) The elements of U & V have the form 


(Sori vi)) +5 =) riun) +S] 


However, since r(u, v) — (ru,v) € S and r(u, v) — (u, rv) E S, we can absorb 
the scalar in either coordinate, that is, 


r|(u, v) + S] = (ru,v) + S = (u,rv) + S 
and so the elements of U ® V can be written simply as 
Yel, vi) + S] 


It is customary to denote the coset (u,v) + S by u & v, and so any element of 


Tensor Products 


U & V has the form 
Xu; & Vi 
as in the previous construction. 
The tensor map t: U x V — U 8 V is defined by 
tlu, v) =u 8v = (u, v)+ S 
This map is bilinear, since 


t(au + bv, w) = (ru + sv, w) +S 


= [r(u, w) + s(v, w)] + S 
= [r(u, w) + S] + [s(v, w) + 5] 
= rt(u, w) + st(v, w) 


and similarly for the second coordinate. 


365 


We next prove that the pair (U & V,t:U x V = U &V) is universal for 


bilinearity when U & V is defined as a quotient space Fy xy /S. 


Theorem 14.3 Let U and V be vector spaces. The pair 
(U@V,tUxV-U®eV) 


is the tensor product of U and V. 


Proof. Consider the diagram in Figure 14.5. Here Fyxyvy is the vector space with 


basis U x V. 

t 
usy -> Fy Ss ey 
Se 
Vi 
W 
Figure 14.5 
Since 


To j(u, v) = T(u,v) = (u,v) +S = u 8v = t(u, v) 
we have 


t=Tmoj 


The universal property of vector spaces described in Example 14.1 implies that 


366 Advanced Linear Algebra 


there is a unique linear transformation o: Fyxy — W for which 
coj=f 
Note that ø sends the vectors (14.1) and (14.2) that generate S to the zero vector 
and so S C ker(c). For example, 
o[r(u, w) + s(v, w) — (ru + sv, w)] 
= o|rj(u, w) + sj(v, w) — j(ru + sv, w)| 
=roj(u,w) + soj(v, w) — of(ru + sv, w) 
=rf(u,w) + sf(v, w) — f(ru + sv, w) 
=0 
and similarly for the second coordinate. Hence, Theorem 3.4 (the universal 


property described in Example 14.2) implies that there exists a unique linear 
transformation T: U & V — W for which 


TOT =O 


Hence, 


Tot=Tonmoj=ooj=f 
As to uniqueness, if r’ o t = f, then 
T'[(u,v) + S] = f(u, v) = 7[(u,v) + S] 


and since the cosets (u,v) + S generate Fyxy/S, we conclude that T = 7. 
Thus, 7 is the mediating morphism and (U @ V, t) is universal for bilinearity.0 


Let us take a moment to compare the two previous constructions. Let 
{e; |i € I} and {f; | j E J} be bases for U and V, respectively. Let (T”, t’) be 
the tensor product as constructed using these two bases and let 
(T,t) = (Fuxv/S,t) be the tensor product construction using quotient spaces. 


Since both of these pairs are universal for bilinearity, Theorem 14.1 implies that 
the mediating morphism 7 for t with respect to t’, that is, the map 7:7’ => T 
defined by 


Thei Q fj) = (e ff) + 8 


is a vector space isomorphism. Therefore, the basis {(e; ® f;)} of T” is sent to 
the set {(e;, fj) + S}, which is therefore a basis for T. 


In other words, given any two bases {e; | i € I} and {fj | j € J} for U and V, 
respectively, the tensors e; ® fj form a basis for U & V, regardless of which 
construction of the tensor product we use. Therefore, we are free to think of 
ei ® f; either as a formal symbol belonging to a basis for U & V or as the coset 
(ei, fi) + S belonging to a basis for U 8 V. 


Tensor Products 367 


Bilinearity on U x V Equals Linearity on U @V 


The universal property for bilinearity says that to each bilinear function 
f:U x V — W, there corresponds a unique linear function Tt: U @V — W, 
called the mediating morphism for f. Thus, we can define the mediating 
morphism map 


g:hom(U,V;W) = L(U @V,W) 
by setting ¢f = 7. In other words, ¢f is the unique linear map for which 
(¢f)(u @ v) = f(u, v) 
Observe that ¢ is itself linear, since if f, g E€ hom(U,V; W), then 
[ro(f) + so(g)](u@ v) = rf (u,v) + sg(u,v) = (rf + sg) (u,v) 
and so r¢(f) + sé(g) is the mediating morphism for rf + sg, that is, 
rolf) + slg) = (rf + sg) 


Also, @ is surjective, since if r:U & V — W is any linear map, then 
f=rTtot:U xV — W is bilinear and has mediating morphism 7, that is, 
of =T. Finally, ọ is injective, for if of = 0, then f = df ot = 0. We have 
established the following result. 


Theorem 14.4 Let U, V and W be vector spaces over F. Then the mediating 
morphism map o:hom(U,V;W) —> L(U @®V,W), where of is the unique 
linear map satisfying f = of o t, is an isomorphism and so 


g:hom(U,V;W) ~ L(U @V,W) o 


When Is a Tensor Product Zero? 


Armed with the universal property of bilinearity, we can now discuss some of 
the basic properties of tensor products. Let us first consider the question of 
when a tensor > u; ® v; is zero. 


The bilinearity of the tensor product gives 
0@v=(0+0) ®v=08v+08v 


and so 0 © v = 0. Similarly, u © 0 = 0. Now suppose that 


Sou @ vu; =0 


1 


where we may assume that none of the vectors u; and v; are 0. Let 
f:U xV — W be a bilinear map and let t:U & V — W be its mediating 
morphism, that is, 70 t = f. Then 


368 Advanced Linear Algebra 


o=) weu) =D ron Ui; Vi) =D Flue) 


The key point is that this holds for any bilinear function f:U x V — W. In 
particular, let a € U* and 8 € V “and define f by 


f(u, v) = a(u)B(v) 


which is easily seen to be bilinear. Then the previous display becomes 


5 a(u;)3(v;) = 0 


i 


If, for example, the vectors u; are linearly independent, we can take a to be a 
dual vector uj, to get 


0= » uz (ui) Blv) = Blv) 


and since this holds for all linear functionals 8 € V*, it follows that v = 0. We 
have proved the following useful result. 


Theorem 14.5 If wi,...,Un are linearly independent vectors in U and 
Vi,- , Un are arbitrary vectors in V, then 


X ugvi=0 > v=0foralli 
In particular, u & v = 0 if and only ifu = 0 or v = 0.0 


Coordinate Matrices and Rank 


If B = {u; | i € I} is a basis for U and C = {v; | j € J} is a basis for V, then 
any vector z € U & V has a unique expression as a sum 


z= 5> Tj (Ui Q vj) 


iel jeJ 


where only a finite number of the coefficients r;; are nonzero. In fact, for a 
fixed z € U & V, we may reindex the bases so that 


b 
D i,j(Ui 8 vy) 


i=1 j=1 


a 


x 
II 


where none of the rows or columns of the matrix R = (r; j) consists only of 0's. 
The matrix R = (r;;) is called a coordinate matrix of z with respect to the 
bases B and C. 


Note that a coordinate matrix R is determined only up to the order of its rows 
and columns. We could remove this ambiguity by considering ordered bases, 


Tensor Products 369 


but this is not necessary for our discussion and adds a complication, since the 
bases may be infinite. 


Suppose that W = {w; | i € I} and V = {a,; | j € J} are also bases for U and 
V, respectively, and that 


c d 
2= > > 815(w; @ zy) 


j=l j=l 


where S = (s;,;) is a coordinate matrix of z with respect to these bases. We 
claim that the coordinate matrices R and S have the same rank, which can then 
be defined as the rank of the tensor z € U 8 V. 


Each wy ,...,We is a finite linear combination of basis vectors in 6, perhaps 
involving some of u1, ..., Ua and perhaps involving other vectors in B. We can 
further reindex B so that each wą is a linear combination of the vectors 
B' = (uy,...,Un), where a < n and set 


Un = span(u1,..., Un) 
Next, extend (w1,..., Wc) to a basis W! = (wy,..., We, We+1; ---, Wn) for Un. 
(Since we no longer need the rest of the basis W, we have commandeered the 
symbols w.+1,-..., Wn, for simplicity.) Hence 


n 
Ü; = > Qinün for i =1,...,n 


h=1 


where A = (ain) is invertible of size n x n. 


Now repeat this process on the second coordinate. Reindex the basis C so that 
the subspace V,,, = span(v,,...,Um) contains £1,..., £q and extend to a basis 
A! = (tiyis Ld; Litice; Tm) for Vin. Then 


m 


zj =X bjkvk for j = i errr oi) 
k=1 


where B = (b; p) is invertible of size m x m. 


Next, write 


by setting r; j = 0 for i > a or j > b. Thus, the n x m matrix R; = (r; j) comes 
from R by adding n — a rows of 0's to the bottom and then m — b columns of 
O's. In particular, R; and R have the same rank. 


370 Advanced Linear Algebra 


The expression for z in terms of the basis vectors w1, ..., Wwe and £1, ..., &q can 


also be extended using 0 coefficients to 


n m 


ap D sil wi ® zj) 


i=l j= 


where the n x m matrix Sı = (s; j) has the same rank as S. 


Now at last, we can compute. First, bilinearity gives 


n m 


Wi 8 Tj = DD j,k (un Q vg) 


=1k=1 


and so 


n m n m Le m, 
a 
z= si j(wi 9 ay) = 4 D > a; nbjklUn ® Vk) 


t=1 j=1 i=l j= 


m n 


ub) (Un ® Ur) 


h=1k=1 È i=1 


sabia) (un Q vg) 


Thus 


n m n m 


DD ri j(ui vj) = z = (A‘SıB)n plun © vk) 


i=l j= h=1k=1 
and so Rı = A'S, B. Since A and B are invertible, we deduce that 
rk(R) = rk(R,) = rk(S1) = rk(S) 


as desired. Moreover, in block matrix terms, we can write 


n=|¢ : and s= j 
4 block block 
and if we write 
; - 
A = | An and B | Bay | 
*  * Í block * block 


then Rı = A'Sı B implies that 


Tensor Products 371 


R = A S Bap 


We shall soon have use for the following special case. If 


z=) u @u =J w8 (14.3) 
i=l i=l 
then R = S = I, and so 
wW; = X ainun fori = Licey? 
h=1 
and 


Tj = bjkuk for 7 = 1,...,7 


k=1 
where if A,r = (a;n) and B, r = (b;;,), then 


I, = AÈ „Bpr 


The Rank of a Decomposable Tensor 


Recall that a tensor of the form u ® v is said to be decomposable. If {u; | i € I} 
is a basis for U and {v; | j € J} is a basis for V, then any decomposable vector 
has the form 


u 8v = 5 risj(ui Q vj) 
aj 


Hence, the rank of a decomposable vector is 1, since the rank of a matrix whose 
(i, j)th entry is r,s; is 1. 


Characterizing Vectors in a Tensor Product 

There are several useful representations of the tensors in U @ V. 

Theorem 14.6 Let {u; | i E€ I} be a basis for U and let {v; | j E€ J} be a basis 
for V. By an “essentially unique” sum, we mean unique up to order and 


presence of zero terms. 
1) Every z €U QV has an essentially unique expression as a finite sum of 


the form 
> Ti jui B Vj 
ij 


where r; į € F and the tensors u; ® vj are distinct. 


372 Advanced Linear Algebra 


2) Every z EU &V has an essentially unique expression as a finite sum of 


the form 
dM ® yi 
$ 


where yi € V and the u;'s are distinct. 
3) Everyz EU &V has an essentially unique expression as a finite sum of 


the form 
i 


where x; € U and the v;'s are distinct. 
4) Every nonzero z € U & V has an expression of the form 


n 
z= Xai & yi 
izi 


where the x;'s are distinct, the y;'s are distinct and the sets {x;} C U and 
{yi} C V are linearly independent. As to uniqueness, n is the rank of z and 
so it is unique. Also, the equation 


a OY = Sa ® Zi 
i=l i=l 


where the w;'s are distinct, the z;'s are distinct and {w;} CU and 
{zi} C V are linearly independent, holds if and only if there exist invertible 
r x r matrices A = (ai j) and B = (b; j) for which A'B = I and 


r r 
Wi = J Qi jh j and Zi = J bi jY 
j=l j=l 


fori =1,...,7. 
Proof. Part 1) merely expresses the fact that {u; ® vj} is a basis for U & V. 
From part 2), we write 


er Q vj = De 


19 i 


ui 8 X rigoj = Yu; © yi 
j i 


Uniqueness follows from Theorem 14.5. Part 3) is proved similarly. As to part 
4), we start with the expression from part 2): 


n 
dow Oy 
i=l 


where we may assume that none of the y;'s are 0. If the set {y;} is linearly 
independent, we are done. If not, then we may suppose (after reindexing if 


Tensor Products 373 


necessary) that 


n—1 


Yn = > riyi 


i=l 


Then 


n n—1 n—1 
X ui SY = Xu; 8 Yi + (e ®& Sinu) 
i=l i=l i=1 
n—1 n—1 
= Sou & y+ X (riun Q yi) 
i=l i=l 


n=1 


= X (ui + 7jUn) Q Yi 


i=1 


But the vectors {ui + riun |1 <i<n-—1} are linearly independent. This 
reduction can be repeated until the second coordinates are linearly independent. 
Moreover, the identity matrix I„ is a coordinate matrix for z and so 
n =rk(I,) = rk(z). As to uniqueness, one direction was proved earlier; see 
(14.3) and the other direction is left to the reader. O 


The proof of Theorem 14.6 shows that if z 4 0 and 


z=% 58t 


icl 


where s; € U and t; € V, then if the multiset {s; |i €I} is not linearly 
independent, we can rewrite z in the form 


iE 


where {s; | 7 € Jp} is linearly independent. Then we can do the same for the 
second coordinate to arrive so at the representation 


where the multisets {x;} and {y;} are linearly independent sets. Therefore, 
rk(x) < |I| and so the rank of z is the smallest integer m for which z can be 
written as a sum of m decomposable tensors. This is often taken as the 
definition of the rank of a tensor. 


However, we caution the reader that there is another meaning to the word rank 
when applied to a tensor, namely, it is the number of indices required to write 
the tensor. Thus, a scalar has rank 0, a vector has rank 1, the tensor z above has 
rank 2 and a tensor of the form 


374 Advanced Linear Algebra 


z=) 5,84 Ou; 


tel 
has rank 3. 
Defining Linear Transformations on a Tensor Product 


One of the simplest and most useful ways to define a linear transformation o on 
the tensor product U ® V is through the universal property, for this property 
says precisely that a bilinear function f on U x V gives rise to a unique (and 
well-defined) linear transformation on U &® V. The proof of the following 
theorem illustrates this well. 


Theorem 14.7 Let U and V be vector spaces. There is a unique linear 
transformation 
6:U* @V* + (U @V)* 
defined by 0(f ® g) = f © g where 
(F © g)(u®v) = f(w)g(v) 


Moreover, @ is an embedding and is an isomorphism if U and V are finite- 
dimensional. Thus, the tensor product f ® g of linear functionals is (via this 
embedding) a linear functional on tensor products. 

Proof. Informally, for fixed f and g, the function (u,v) — f(u)g(v) is bilinear 
in u and v and so there is a unique linear map f © g taking u @ v to f(u)g(v). 
The function (f,g) — f © g is bilinear in f and g since 


(rf +sg) Oh=r(f Oh) +s(gOh) 
and so there is a unique linear map @ taking f 8 gto f © g. 
More formally, for fixed f and g, the map F$ g: U x V — F defined by 


Frg(u,v) = f(u)g(r) 


is bilinear and so the universal property of tensor products implies that there 
exists a unique f © g € (U ® V)* for which 


(F © g)(u®v) = f(u)g(v) 
Next, the map G: U* x V* — (U @V)* defined by 
Gf, 9) =fOg 


Tensor Products 375 


is bilinear since, for example, 


[(rf + sg) © h](u 8v) = (rf + sg)(u) - h(v) 
=rf(u)h(v) + sg(u)h(v) 
=(r(foOh)+s(gOh)]|(u® v) 


which shows that G is linear in its first coordinate. Hence, the universal 
property implies that there exists a unique linear map 


0:U* @V* + (U @V)* 
for which 
Afeg=foOg 


To see that 0 is an injection, if h € U* ® V* is nonzero, then we may write h in 
the form 


h = S ® gi 
i=l 


where the f;¢U* are nonzero and {g,|1<t<n}CV* is linearly 
independent. If 0(h) = 0, then for any u € U and v € V, we have 


0=O(h)(u®v) = Yas ak u® v) =Y )gi(v 
Hence, for each nonzero u € U, the linear functional 
Xfig 
i=l 


is the zero map and so the linear independence of {g;} implies that f;(w) = 0 
for all ¿. Since u is arbitrary, it follows that f; = 0 for alli and so h = 0. 
Finally, in the finite-dimensional case, the map @ is a bijection since 

dim(U* ® V*) = dim((U @ V)*) < co = 
Combining the isomorphisms of Theorem 14.4 and Theorem 14.7, we have, for 
finite-dimensional vector spaces U and V, 


U* 8 V* ~ (U @V)* ~ hom(U, V; F) 


The Tensor Product of Linear Transformations 


We wish to generalize Theorem 14.7 to arbitrary linear transformations. Let 
rT € L(U,U') and o € L(V, V”). While the product T(u)a(v) does not make 
sense, the tensor product Tu & gv does and is bilinear in u and v, that is, the 
following function is bilinear: 


376 Advanced Linear Algebra 


f(u,v) = Tu ® ov 


The same argument that we used in the proof of Theorem 14.7 will work here. 
Namely, the map (u, v) + Tu @ ov from U x V to U’ & V” is bilinear in u and 
v and so there is a unique linear map (7T © o): U & V — U’ @V’ for which 


(TOo)(u®v) = Tu 8 ov 
The function 
@:L(U,U’) x L(V, V’) = LU 8 V,U' 8 V’) 
defined by 
o(7,0) =TO0 
is bilinear, since 


((ar + bu) © ø) (u 8 v) = (at + bu) (u) @ ov 
= (atu + buu) 8 ov 
= afru 8 ov] + b[uu 8 av} 
=a(T © o)(u®v) + b(u © o)(u®v) 
= (a(7 © o) + b(u © o))(u® v) 


and similarly for the second coordinate. Hence, there is a unique linear 
transformation 


6:£L(U ,U’) 8 L(V, V') + LU 8 V,U' @V’) 
satisfying 
O(r@0)=TOa 
that is, 
[O(7 ® o)|(u®@v) = Tu 8 ov 


To see that 8 is injective, if h € L(U, U’) ® L(V, V”) is nonzero, then we may 
write 


h= Sh ® gi 
i=l 


where the f; € £(U,U") are nonzero and the set {g:i} C L(V, V’) is linearly 
independent. If 0(h) = 0, then for all u € U and v € V we have 


0 = 0(h)(u 8 v) = DAT u®v) = Doh 8 giv 


Since h Æ 0, it follows that f; 4 0 for some 7 and so we may choose a u € U 
such that f;(w) #0 for some i. Moreover, we may assume, by reindexing if 


Tensor Products 377 


necessary, that the set {fi (w),...,fm(w)} is a maximal linearly independent 
subset of {fi (u),..., fn(u)}. Hence, for each k > m, we have 


m 


= X orifilu) 
i=l 


and so 


0= Sno) 8 gilv 
= =D f) ® giv Ta Sacto ifilu 


8 gr(v) 
k=m+1 Li 
m n m 


= Diu) ® giv +> Dail filu ) © g.(v)] 
k=m+1 i= 
5 anato) 


m m 


= VA(u) @ gi(v) +) Aug 


k=m+1 
m n 
=Sofi(u) @ + So arig) 
i=1 k=m+1 


Thus, the linear independence of {fi(u),..., fm(u)} implies that for each 
i< m, 


+ 5 Qk igk(v) = 


k=m+1 


for all v € V and so 


n 
git D Qk, igk = 9 


k=m+1 


But this contradicts the fact that the set {g;} is linearly independent. Hence, it 
cannot happen that 0(h) = 0 for h 4 0 and so @ is injective. 


The embedding of L(U, U’) 8 L(V, V’) into L(U @V,U' 8 V’) means that 
each T & o can be thought of as the linear transformation T © ø from U © V to 
U' & V’, defined by 


(rTOQo)(u®v) = Tu 8 ov 


In fact, the notation T ® a is often used to denote both the tensor product of 
vectors (linear transformations) and the linear map 7 © ø, and we will do this as 
well. In summary, we can say that the tensor product rT&o of linear 
transformations is (up to isomorphism) a linear transformation on tensor 
products. 


378 Advanced Linear Algebra 


Theorem 14.8 There is a unique linear transformation 
6:L(U,U') 8 L(V, V’) = L(U @V,U' @V') 
defined by 0(T ® 0) = T © o where 
(r©o)(u®v) = Tu 8 ov 


Moreover, 0 is an embedding and is an isomorphism if all vector spaces are 
finite-dimensional. Thus, the tensor product T ® o of linear transformations is 
(via this embedding) a linear transformation on tensor products. O 


Let us note a few special cases of the previous theorem. 


Corollary 14.9 Let us use the symbol X — Y to denote the fact that there is an 
embedding of X into Y that is an isomorphism if X and Y are finite- 
dimensional. 


1) Taking U' = F gives 
U* L(V, V’) SLU @V,V’') 
where 
(f 0)(u 8v) = f(ujo(v) 


for f € U*. 
2) Taking U' = F and V' = F gives 


U* g9 V* Š (U @V)* 
where 
(f @ g)(u®v) = f(ujg(v) 


3) Taking V =F and noting that L(F,V') ~% V' and U@F RU gives 
(letting W = V’) 


L(U, U) &W SL(U,U' @W) 
where 
(T w)(u) = TU 8w 
4) Taking U' = F and V = F gives (letting W = V') 
U* 89W © L(U,W) 
where 


(f 8 w) (u) = f(u)w Oo 


Tensor Products 379 


Change of Base Field 


The tensor product provides a convenient way to extend the base field of a 
vector space that is more general than the complexification of a real vector 
space, discussed earlier in the book. We refer to a vector space over a field F as 
an F’-space and write Vr. 


Actually, there are several approaches to “upgrading” the base field of a vector 
space. For instance, suppose that K is an extension field of F, that is, F C K. 
If {b;} is a basis for Vp, then every x € Vp has the form 


t= X rib 


where r; € F. We can define a K-space Vg simply by taking all formal linear 
combinations of the form 
T = X aib 


where a; € K. Note that the dimension of Vx as a K-space is the same as the 
dimension of Vp as an F-space. Also, Vx is an F’-space (just restrict the scalars 
to F) and as such, the inclusion map j:Ve —> Vg sending x € Vp to 
j(“) = x € Vx is an F-monomorphism. 


The approach described in the previous paragraph uses an arbitrarily chosen 
basis for Vp and is therefore not coordinate free. However, we can give a 
coordinate-free approach using tensor products as follows. Since K is a vector 
space over F, we can form the tensor product 


Wr = K 8 FVF 


It is customary to include the subscript F on © p to denote the fact that the 
tensor product is taken with respect to the base field F. (All relevant maps are 
F-bilinear and F-linear.) However, since Vp is not a K-space, the only tensor 
product of K and Vp that makes sense is the F'-tensor product and so we will 
drop the subscript F. 


The tensor product Wp is an F’-space by definition of tensor product, but we 
can make it into a k-space as follows. For a € K, the temptation is to “absorb” 
the scalar a into the first coordinate, 


a(b 8v) = (af) @v 
but we must be certain that this is well-defined, that is, 
B®v=7yew => (af) ®@v=(ay) Qw 


But for a fixed a, the map (3,v) + (aß) @ v is bilinear and so the universal 
property of tensor products implies that there is a unique linear map 
B v = (ap) ® v, which we define to be scalar multiplication by a. 


380 Advanced Linear Algebra 


To be absolutely clear, we have two distinct vector spaces: the F'-space 
Wr = K ® Vr defined by the tensor product and the K-space Wg = K & Vr 
with scalar multiplication by elements of K defined as absorption into the first 
coordinate. The spaces Wp and Wy are identical as sets and as abelian groups. 
It is only the “permission to multiply by” that is different. Accordingly, we can 
recover Wr from Wx simply by restricting scalar multiplication to scalars from 
F, 


Thus, we can speak of “F-linear” maps 7 from Vp into Wg, with the expected 
meaning, that is, 
T(ru+ sv) = rTu + stv 


for all scalars r,s € F. 


If the dimension of K as a vector space over F is d, then 
dimp (WF) = dim; (K Q Vr) =d. dimp (VF) 


As to the dimension of Wx, it is not hard to see that if {b;} is a basis for Vp, 
then {1 @ b;} is a basis for Wx. Hence 


dimg (Wx) = dimp (Vr) 


The map u: Vr —> Wr defined by uv = 1 © v is easily seen to be injective and 
F-linear and so Wp contains an isomorphic copy of Vp. We can also think of u 
as mapping Vr into Wx, in which case ju is called the K-extension map of Vp. 
This map has a universal property of its own, as described in the next theorem. 


Theorem 14.10 The F-linear K-extension map w: Vrp —> K Vr has the 
universal property for the family of all F-linear maps from Vr into a K-space, 
as measured by K-linear maps. Specifically, for any F-linear map f: Vr > Y, 
where Y is a K-space, there exists a unique K-linear map T: K ® Vr — Y for 
which the diagram in Figure 14.6 commutes, that is, for which 


TOM=f 


Proof. If such a K-linear map T: K ® Vp — Y is to exist, then it must satisfy, 
for any b E K, 


T(B 8v) = PT(1 8 v) = brulu) = BF) 


This shows that if 7 exists, it is uniquely determined by f. As usual, when 
searching for a linear map 7 on a tensor product such as K & Vr, we look for a 
bilinear map. The map g: (K x Vr) — Y defined by 


9(B,v) = PF) 


Tensor Products 381 


is bilinear and so there exists a unique F’-linear map 7 for which 
T(8 8v) = Bf (v) 
It is easy to see that 7 is also K-linear, since ifa € K, then 


Tla(B 8 v)] = T(ab 8 v) = aß f (v) = ar(B 8 v) O 
V, —~>k @V, 


T 


M 
Y 


Figure 14.6 


Theorem 14.10 is the key to describing how to extend an F-linear map to a K- 
linear map. Figure 14.7 shows an F-linear map 7: V — W between F’-spaces V 
and W. It also shows the K-extensions for both spaces, where K & V and 
K & W are K-spaces. 


v — > w 
Hv Uw 


kev ——_—> kew 


Figure 14.7 


If there is a unique K-linear map 7 that makes the diagram in Figure 14.7 
commute, then this would be the obvious choice for the extension of the F- 
linear map 7 to a K-linear map. 


Consider the F-linear map o = (uworT):V = K&W into the K-space 
K&W. Theorem 14.10 implies that there is a unique K-linear map 
T: K 8 V — K & W for which 


TO uy =o 
that is, 
TO uy = bw OT 


Now, 7 satisfies 


382 Advanced Linear Algebra 


TF(8 8v) = GT(1 @v) 

= P(T o py )(v) 

= B(uw ° 7)(v) 

= B(1 8 Tv) 

= P@Tv 

= (tK @T)(B@v) 
and so T = lx 8T. 
Theorem 14.11 Let V and W be F-spaces, with K-extension maps y and 
uw, respectively. (See Figure 14.7.) Then for any F-linear map T: V — W, the 


map tKƏT:KƏV — K @W is the unique K-linear map that makes the 
diagram in Figure 14.7 commute, that is, for which 


LOT=(IKQT)OV O 


Multilinear Maps and Iterated Tensor Products 


The tensor product operation can easily be extended to more than two vector 
spaces. We begin with the extension of the concept of bilinearity. 


Definition Jf V,,...,V, and W are vector spaces over F, a function 
f: Vi X x Vn —> W is said to be multilinear if it is linear in each coordinate 
separately, that is, if 


/ 
f(u, vee, Uk-1, TU F SU , Uk+1; -+ sUn) 
1 
=rf(ui,...,Up—1, V; Ukts +- -s Un) + Sf (U1. -. , Uk—1, U, Uk, +++; Un) 
for all k =1,...,n. A multilinear function of n variables is also referred to as 


an n-linear function. The set of all n-linear functions as defined above will be 
denoted by hom(Vi,...,Vn;W). A multilinear function from V; X +++ X Vn to 
the base field F is called a multilinear form or n-form. O 


Example 14.7 

1) If A is an algebra, then the product map p: A x --- x A — A defined by 
ular, ..., an) = a1: an is n-linear. 

2) The determinant function det: M, — F is an n-linear form on the columns 
of the matrices in M,,.0 


The tensor product is defined via its universal property. 


Definition As pictured in Figure 14.8, let Vi X --- X Vn be the cartesian 
product of vector spaces over F. A pair (T,t: Vi x +- x Va —> T) is universal 
for multilinearity if for every multilinear map f:V, X = X Vn — W, there is 
a unique linear transformation T:T — W for which 


Tensor Products 383 


f=rot 


The map T is called the mediating morphism for f. If (T,t) is universal for 
multilinearity, then T is called the tensor product of V,,..., V, and denoted by 
Vi 8- @ Vn. The map t is called the tensor map. O 


Vx xV, =l þł V,©--@V, 


T 


v 
Ww 


Figure 14.8 
As we have seen, the tensor product is unique up to isomorphism. 


The basis construction and coordinate-free construction given earlier for the 
tensor product of two vector spaces carry over to the multilinear case. 


In particular, let B; = {e;,; | j € Ji} be a basis for V; for i = 1,...,n. For each 
ordered n-tuple (e€1;,,--.,€ni,), construct a new formal symbol 
€1,i, ® ++: ®@ en,i, and define T to be the vector space with basis 


D = {e14, Q @ eng, | te E Je} 


The tensor map t: Vi x ++- x Vn — T is defined by setting 
t(€1 i) +++ Enin) = Erin @ +++ @ Eni, 


and extending by multilinearity. This uniquely defines a multilinear map t that is 
universal for multilinear functions from Vy x --- x Vn. 


Indeed, if g:Vj x---x Va — W is multilinear, the condition f =7Tot is 
equivalent to 

T (E14, @ +++ Enin) = f (€r, «++ 1 Enin) 
which uniquely defines a linear map r: T — W. Hence, (7, t) has the universal 


property for multilinearity. 


Alternatively, we may take the coordinate-free quotient space approach as 
follows. 


Definition Let V,,...,V;, be vector spaces over F and let F be the vector space 
with basis Vi x --- X Vn. Let S be the subspace of F generated by all vectors of 
the form 


384 Advanced Linear Algebra 


/ 
TÒU, +s Ups U, Vey -es Un) F SCU, ang Uk—1; U s Dee e224 Un) 


1 
— (V1, ..., Uk—1, TU + SU', Uk41, 2025 Un) 


for r,s E€ F, u,u' € V, and v; € V; for i 4 k. The quotient space F/S is the 
tensor product of V1, ... , V, and the tensor map is the map 


Atty seg Un) = (iy ang Va) HS O 


As before, we denote the coset (v1,...,Un) +S by v1 ®-+:@v,_, and so any 
element of Vi © --- ® V, is a sum of decomposable tensors, that is, 


Xvi Q- Q Ui, 


where the vector space operations are linear in each variable. 


Here are some of the basic properties of multiple tensor products. Proof is left to 
the reader. 


Theorem 14.12 The tensor product has the following properties. Note that all 
vector spaces are over the same field F. 
1) (Associativity) There exists an isomorphism 


T: (V1 8- 8 Vn) 8 (W1 8- 8 Wm) > Vi 8- 8 Vn 9 W1 8- 8 Wm 
for which 
T[(U1 8 +++ @ Un) Q (W1 @ +++ O Wm)] = V1 @ +++ @ Un @ W1 @ ++ @ Wm 
In particular, 
USV)@PWRZU@(VEW)ZFUBVEW 


2) (Commutativity) Let 7 be any permutation of the indices {1,...,n}. Then 
there is an isomorphism 


a: Vi 8: 8 Vr > Vra) @ +++ @ Van) 
for which 
O(V1 @ +++ @ Un) = Ua(1) @ +++ @ Va(n) 
3) There is an isomorphism pı: F ® V — V for which 
pi(r @v) =rv 
and similarly, there is an isomorphism p>: V & F — V for which 
plv 8r) =rv 
Hence, F9 V xV aV eFo 


The analog of Theorem 14.4 is the following. 


Tensor Products 385 


Theorem 14.13 Let V,,...,V;, and W be vector spaces over F. Then the 
mediating morphism map 


é:hom(Vi,...,Vn;W) > LW 8- @ Vn, W) 


defined by the fact that df is the unique mediating morphism for f is an 
isomorphism. Thus, 


hom(V;,...,VnjW) = L(V 8 --- @ Va, W) 


Moreover, if all vector spaces are finite-dimensional, then 
dim[hom(Vi, ..., Vn; W)] = dim(W) - | [dim(V;) o 
i=1 


Theorem 14.8 and its corollary can also be extended. 


Theorem 14.14 The linear transformation 
6: L(U1, Uj) Q +- ® L(Un, U!) LU) 8 -8 Un, UI 8- 8 UJ) 
defined by 
OT @ +++ @ T)(U1 8 @ Un) = NU @ +++ @ TnUn 


is an embedding and is an isomorphism if all vector spaces are finite- 
dimensional. Thus, the tensor product Ti ® --- ® Tn of linear transformations is 
(via this embedding) a linear transformation on tensor products. Two important 
special cases of this are 


Us @---@U* Š (U1 8- @Un)* 


where 
(fi @ ++ @ fa) (U1 @ +++ @ Un) = filur) fnlun) 
and 
Ut -QU 8V GLU, ® ++: @ Un, V) 
where 


(fi @ +++ 8 fn @ v) (ur @ +++ @ Un) = filur) fnlun)v O 


Tensor Spaces 


Let V be a finite-dimensional vector space. For nonnegative integers p and q, 
the tensor product 


386 Advanced Linear Algebra 


T(V) = VQ::-@V @V*®---@V* = V% @ (V*)e4 


p factors q factors 


is called the space of tensors of type (p,q), where p is the contravariant type 
and q is the covariant type. If p = q = 0, then 7?(V) = F, the base field. Here 
we use the notation V®” for the n-fold tensor product of V with itself. We will 
also write V *” for the n-fold cartesian product of V with itself. 
Since V = V**, we have 

TP(V) = V? 8 (V*)94 x ((V*)? @ VO)* ~ hompr((V*)*? x V“4, F) 
which is the space of all multilinear functionals on 

Vi xe xVExXV xX KV 


p factors q factors 


In fact, tensors of type (p,q) are often defined as multilinear functionals in this 
way. 
Note that 
dim(T?(V)) = [dim(V )]?*¢ 
Also, the associativity and commutativity of tensor products allows us to write 
TPV) STEV) = TRTO) 
at least up to isomorphism. 


Tensors of type (p, 0) are called contravariant tensors 


T.V) =T(V)= V8- eV 


p factors 
and tensors of type (0, q) are called covariant tensors 


T(V) =T(V)=V* 8-8 V* 


q factors 


Tensors with both contravariant and covariant indices are called mixed tensors. 


In general, a tensor can be interpreted in a variety of ways as a multilinear map 
on a cartesian product, or a linear map on a tensor product. Indeed, the 
interpretation we mentioned above that is sometimes used as the definition is 
only one possibility. We simply need to decide how many of the contravariant 
factors and how many of the covariant factors should be “active participants” 
and how many should be “passive participants.” 


Tensor Products 387 


More specifically, consider a tensor of type (p, q), written 
V1 @ ++ OMD DJD D nD fa ETP) 


where m < pand n < q. Here we are choosing the first m vectors and the first 
n linear functionals as active participants. This determines the number of 
arguments of the map. In fact, we define a map from the cartesian product 


Vix x V* xV x xV 
—nr l a 
m factors n factors 
to the tensor product 
* * 
V®:::@V @V*®@::-@V 
p—m factors q—n factors 
of the remaining factors by 
(vi VZ 8v 8 fi Bees ® fa)(hi,..-, hm; 1, ---;%n) 
= hy(v1)-+-Am(Um) fi(£1) fnn) um 8 @ Up B farig fa 


In words, the first group vı Q --- ® Um of (active) vectors interacts with the first 
group hı, ..., Am of arguments to produce the scalar h (v1) --Am(Um). The first 
group fı Q-Q fn of (active) functionals interacts with the second group 
Z1, -.:, Zn Of arguments to produce the scalar fı(x1)---fn(£n). The remaining 
(passive) vectors Um4;®---®v, and functionals fayı @---@ fy are just 
“copied” to the image tensor. 


It is easy to see that this map is multilinear and so there is a unique linear map 
from the tensor product 


V*@: 9 V* 9V8- 9V 
—— a ee ed 


m factors n factors 
to the tensor product 


VE- 9V 9V8. 8V 


p—m factors q—n factors 
defined by 


(vO OPO © fa)(hi @ +++ @ hm @ L1 @ +++ Tn) 
= hi (vr)++ Pin (Um) fi (@1)+ + fin(@n)Um+1 Qe ® Up Q fny1 Q8 ta 


Moreover, the map 

p: V @ (V*)® > L(V @V™, VOP g (y*m) 
defined by 

b(¥1 @ ++ @ vp @ fr EA E =V1 O+ Op @ ORON 


388 Advanced Linear Algebra 


is an isomorphism, since if vı © =- © vp © fı © ++: © fy is the zero map then 
hi (v1) Am (Um) ila) faln ma @ +++ @ Vp fn @ +++ ®@ fy =O 
for all h; € V* and x; € V, which implies that 
118+ @y@fi@-@f, =0 
As usual, we denote the map v1 © --- © Up © fiO- © fy by 
V1 @ +++ @ Up @ fr @-- fy 

Theorem 11.15 For0 < m < pand0<n<q 

THY) = L((V*)2” g V8”, VOP- g (y*) 80) o 
When m = pand n = q, we get 

TEV) ~ L((V*)? @V™, F) = ((V")®? @ V™)* 


as before. 


Let us look at some special cases. For q = 0 we have 
TEV) #3 £((V*)O™, VOR) 
where 
(vi Q- @ Up)(hy @ +++ @ hm) = hr (v1) hm (Vm Wm Q +++ @ Up 
When p = q = 1, we get form = 0 and n = 1, 
TLV) ~ L(FV,V 8 F) ~ L(V) 

where 

w8 f)(w) = f(w)v 
and form = 1 and n = 0, 

TLV) ~ L(V* @ F, F @V*) = L(V", V*) 

where 

(v@ f)(h) = hv) f 
Finally, when m = n = 1, we get a multilinear form 

(v@ f)(h, w) = h(v) f (w) 


Consider also a tensor f @g of type (0,2). When n=q=2 we get a 
multilinear functional f Q g: (V x V) — F defined by 


Tensor Products 389 


(F @ g)(v, w) = f(v)g(w) 
This is just a bilinear form on V. 
Contraction 


Covariant and contravariant factors can be “combined” in the following way. 
Consider the map 


h: V”? x (V*)*4 > TPT (V) 
defined by 
Av, ---5Up, fo- f) = A28 +++ @ vp ® fD fa) 
This is easily seen to be multilinear and so there is a unique linear map 
6:TP(V) + TE (V) 
defined by 
OMB- @ Vp @ fD @ fg) = filvi)(v2 @ +++ @ vp fD fa) 


This is called the contraction in the contravariant index 1 and covariant index 
1. Of course, contraction in other indices (one contravariant and one covariant) 
can be defined similarly. 


Example 14.8 Let dim(V) > 1 and consider the tensor space T (V), which is 
isomorphic to L(V) via the map 
(v@ f)(w) = f(w)u 


For a “decomposable” linear operator of the form v & f as defined above with 
v#0 and f £0, we have ker(v &® f) =ker(f), which has codimension 1. 
Hence, if f(w)(v) = (v® f)(w) # 0, then 


V = (w) ® ker( f) = (w) & £ 
where £o is the eigenspace of v & f associated with the eigenvalue 0. 
In particular, if f (v) # 0, then 
(w8 f)(v) = f(v)v 
and so v is an eigenvector for the nonzero eigenvalue f(v). Hence, 
V = (v) © & = Efa) D E 


and so the trace of v & f is 


390 Advanced Linear Algebra 


trv @ f) = flv) = 8 f) 
where 0 is the contraction map.O 
The Tensor Algebra of V 
Consider the contravariant tensor spaces 
T?(V) = T(V) = VPP 


For p = 0 we take T? (V) = F. The external direct sum 


of these tensor spaces is a vector space with the property that 
TP(V) @T"(V) = T(V) 


This is an example of a graded algebra, where T?” (V) are the elements of grade 
p. The graded algebra T(V) is called the tensor algebra over V. (We will 
formally define graded structures a bit later in the chapter.) 


Since 
T(V) = V* 8-8 V* = T(V*) 
q factors 
there is no need to look separately at T,(V). 
Special Multilinear Maps 
The following definitions describe some special types of multilinear maps. 
Definition 


1) A multilinear map f:V*" — W is symmetric if interchanging any two 
coordinate positions changes nothing, that is, if 


PUG Oiss Uaa Un) SF Olja Una Din n) 


for any i # j. 

2) A multilinear map f:V*" — W is antisymmetric or skew-symmetric if 
interchanging any two coordinate positions introduces a factor of —1, that 
is, if 


PVs Uy vee Vises Un) =F Wipes Viens Vie Ur) 


fori F# j. 


Tensor Products 391 


3) A multilinear map f:V" — W is alternate or alternating if 
vi =v; forsomei#Aj =>  flui,...,Un) =0 = 


As in the case of bilinear forms, we have some relationships between these 
concepts. In particular, if char( F) = 2, then 


alternate = symmetric 4 skew-symmetric 
and if char( F) # 2, then 


alternate <= skew-symmetric 


A few remarks about permutations are in order. A permutation of the set 
N = {1,...,n} is a bijective function 7: N — N. We denote the group (under 
composition) of all such permutations by Sn. This is the symmetric group on n 
symbols. A cycle of length k is a permutation of the form (i1, i2,..., ik), which 
sends 2, to t,1; for u = 1,...,k — 1 and also sends i+ to i4. All other elements 
of N are left fixed. Every permutation is the product (composition) of disjoint 
cycles. 


A transposition is a cycle (i, 7) of length 2. Every cycle (and therefore every 
permutation) is the product of transpositions. In general, a permutation can be 
expressed as a product of transpositions in many ways. However, no matter how 
one represents a given permutation as such a product, the number of 
transpositions is either always even or always odd. Therefore, we can define the 
parity of a permutation 7 € S» to be the parity of the number of transpositions 
in any decomposition of m as a product of transpositions. The sign of a 
permutation is defined by 


= 1 has even parity 
Us { —1 has odd parity 
If sg(z) = 1, then 7 is an even permutation and if sg(7) = —1, then 7 is an 


odd permutation. The sign of 7 is often written (—1)”. 


With these facts in mind, it is apparent that f is symmetric if and only if 
Fins te) = J Vries) 
for all permutations 7 € S,, and that f is skew-symmetric if and only if 
f(v- Un) = (=1)" f (vz; tee » Un(n)) 


for all permutations 7 € Sn. 


A word of caution is in order with respect to the notation above, which is very 
convenient albeit somewhat prone to confusion. It is intended that a permutation 
m permutes the coordinate positions in f, not the indices (despite appearances). 
Suppose, for example, that f: R? x R? — X and that {e1, e2} is a basis for R°. 


392 Advanced Linear Algebra 


If m = (12), then 7 applied to f(e1,e1) gives f(e1,e,) and not f(e2,e2), since 
m permutes the two coordinate positions in f (v1, v2). 


Graded Algebras 


We need to pause for a few definitions that are useful in discussing tensor 
algebras. An algebra A over F is said to be a graded algebra if as a vector 
space over F, A can be written in the form 


A=Q4 
i=0 
for subspaces A; of A, and where multiplication behaves nicely, that is, 


The elements of A; are said to be homogeneous of degree i. If a € A is written 


a = Aj, +++ + Ai, 


for ai, E Ai, ik A ij, then a;, is called the homogeneous component of a of 
degree iz. 


The ring of polynomials F[|x] provides a prime example of a graded algebra, 
since 


Fle] = Fil 


i=0 


where F;[z] is the subspace of F [x] consisting of all scalar multiples of 2’. 


More generally, the ring F[z1,..., £n] of polynomials in several variables is a 
graded algebra, since it is the direct sum of the subspaces of homogeneous 
polynomials of degree 7. (A polynomial is homogeneous of degree i if each 
term has degree i. For example, p = rr? + £z1£2£3 is homogeneous of degree 
3.) 


The Symmetric and Antisymmetric Tensor Algebras 


Our discussion of symmetric and antisymmetric tensors will benefit by a 
discussion of a few definitions and setting a bit of notation at the outset. 


Let Fple1,---, €n] denote the vector space of all homogeneous polynomials of 
degree p (together with the zero polynomial) in the independent variables 
€1,---,€n- As is sometimes done in this context, we denote the product in 
Fple1, ..., en] by V , for example, writing e1e2es as e1 V e2 V es. The algebra of 
all polynomials in e1, ... , €n is denoted by F'[e,,..., €n]. 


Tensor Products 393 


We will also need the counterpart of F,[e1,...,@n] in which multiplication acts 
anticommutatively, that is, ejej = —e;e;. 


Definition Let E = (e1,..., €n) be a sequence of independent variables. For 
p <n, let F; [e1,.-..,€n] be the vector space over F with basis 


A,(E) = {ei ei, 


ii <i <- < ip} 


consisting of all words of length p over E that are in ascending order. Let 
Fg [e1,..., €n] = Fe, which we identify with F by identifying e with 1 € F. 
Define a product on the direct sum 


F- f|e1,..., €n] = DE, 


as follows. First, the product f Ng of monomials f = xı--:xp E F, and 

g = Yr: Yq E F` is defined as follows: 

1) Ift- -£pyr Yq has a repeated factor then f Ng = 1. 

2) Otherwise, reorder x1- :£pyı: yq in ascending order, say z1: :Zp+q, via the 
permutation o and set 


fAg= (= 1) Apa 


Extend the product by distributivity to F~|e,,...,@n]. The resulting product 
makes F` |e1,...,e€n] into a (noncommutative) algebra over F. This product is 
called the wedge product or exterior product on F[e,,...,¢€,].0 


For example, by definition of wedge product, 
e2 \e, À €3 = —e€, A eg A €3 


Let B = {e1,...,e,} be a basis for V. It will be convenient to group the 
decomposable basis tensors e;,®---@e;, according to their index multiset. 
Specifically, for each multiset M = {i1,..., ip} with 1 < i, < n, let Gy, be the 
set of all tensors 


Ck, Q +++ @ Ek, 


where (kj,...,k,) is a permutation of {%),...,7,}. For example, if 
M = {2, 2,3}, then 


Gur = {€2 8 €2 @ €3, €2 Q €3 Q €, €3 8 €2 Q e2} 
Ifv € T?(V) has the form 


v= X Qil.. hy Cay QD Ci, 


it,- nip 


where aj,,...;, # 0, then let Gy 1(v) be the subset of Gj, whose elements appear 


394 Advanced Linear Algebra 


in the sum for v. For example, if 
U = 2€2 @ e2 Q €3 + den Q €3 @ e + €3 8 e3 8 e 
then 
Gy2.2,3}(v) = {€2 8 €2 Q €3, €2 Q €3 Q e2} 


Let Sy(v) denote the sum of the terms of v associated with Gw(v). For 
example, 


S4223} (v) = 2e2 Q ez ® €3 + 322 Q €3 Q ez 


Thus, v can be written in the form 


v= X Sulo) = 5 5 aqt 
M 


M teGy (v) 


where the sum is over a collection of multisets M with Sw (v) 4 0. Note also 
that a; Æ 0 since t € Gwm (v). Finally, let 


UM = Ci DR ĉi, 
be the unique member of Gw for which i; < ig < +++ < ip. 


Now we can get to the business at hand. 
Symmetric and Antisymmetric Tensors 


Let Sp be the symmetric group on {1,...,p}. For each o € Sp, the multilinear 
map f,:V*? — T?(V) defined by 


fo(21,---, 2p) = £01 B: O Lop 
determines a unique linear operator A, on T?(V) for which 
Ag (@1 ® +++ Q Lp) = Lo1 @ +++ Q Lop 
For example, if p = 3 and ø = (12), then 
A(12)(V1 8 v3 Q v2) = v3 Q V1 Q v2 
Let {e1, ... , en} be a basis for V. Since A is a bijection of the basis 


ei, = B} 


B= {e1 8 8 e; 


it follows that As is an isomorphism of T?(V). Note also that A, is a 
permutation of each Gw, that is, the sets Gw are invariant under A,. 


Definition Let V be a finite-dimensional vector space. 


Tensor Products 395 


I) Atensort € T?(V) is symmetric if 
Agé=t 
for all permutations o € Sp. The set of all symmetric tensors 
ST?(V) = {t € T”(V) | At = t for all o € Sp} 
is a subspace of T?(V), called the symmetric tensor space of degree p 


over V. 
2) Atensort € T?(V) is antisymmetric if 


Ast = (-1)%t 
The set of all antisymmetric tensors 
AT? (V) = {t € T?(V) | Ast = (—1)°*t for all o € Sp} 
is a subspace of T” (V), called the antisymmetric tensor space or exterior 


product space of degree p over V.O 


We can develop the theory of symmetric and antisymmetric tensors in tandem. 
Accordingly, let us write (anti)symmetric to denote a tensor that is either 
symmetric or antisymmetrtic. 


Since for any s,t € Gy, there is a permutation A, taking s to t, an 
(anti)symmetric tensor v must have G'm(v) = Gm and so 


w55 Lae] 
M M \teGy, 
Since A, is a permutation of Gwm, it follows that v is symmetric if and only if 
Ao(Su(v)) = Su(v) 


for all o € S, and this holds if and only if the coefficients a; of Sm(v) are 
equal, say a; = ayy for all t € Gu. Hence, the symmetric tensors are precisely 
the tensors of the form 


= 5 (an y>) 
M tEGm 
The tensor v is antisymmetric if and only if 
ào(Su(v)) = (1) Sm (w) (14.4) 


In this case, the coefficients a; of Sm (v) differ only by sign. Before examining 
this more closely, we observe that M must be a set. For if M has an element k 
of multiplicity greater than 1, we can split Gj, into two disjoint parts: 


396 Advanced Linear Algebra 


Gu = Gu UGK 
where G'y are the tensors that have ex in positions r and s: 


Gu = {68D ee DD eR B+ 8e) 


position r position s 


Then ;;) fixes each element of Gh; and sends the elements of Gj, to other 
elements of Ghr. Hence, applying Àq- s) to the corresponding decomposition of 
Sur(v): 


Su(v) = Su + Su 
gives 
—(Su + Su) = —Su = àq) Sm = Sm + àa) Shr 


and so Shr = 0, whence Sw (v) = 0. Thus, M is a set. 


Now, since for any o € Sp, 
Gu = {Act | t € Gu} 
equation (14.4) implies that 
(-1)" 5 att = A> ( 5 at) = 5 aAgt = 5 aut 
tEGm tEGm tEGm tEGm 
which holds if and only if aà- = (—1)° az, or equivalently, 
ayt = (1) 


for all t€ Gm and o€S,. Choosing u= um = € 8 €i, where 
iy <- < ip as standard-bearer, if o,, denotes the permutation for which 
Ao, (u) = t, then 


Q = (=1) ie 
Thus, v is antisymmetric if and only if it has the form 
v= 5 (ex > car) 
M teG yy 


where ayy = Q,, Æ 0 and the sum is over a family of sets. 


In summary, the symmetric tensors are 


Tensor Products 397 


-zez 


M teGyr 
where M is a multiset and the antisymmetric tensors are 
v= 5 (ex 5 car) 
M teGyy 


where M is a set. 


We can simplify these expressions considerably by representing the inside sums 
more succinctly. In the symmetric case, define a surjective linear map 


MI > Lp lCig aie tal 
by 
Then @ ++ @ eg) = en Ve V e 


and extending by linearity. Since + takes every member of Gw to the same 
monomial Tu = ei, V ++ V e;,, Where i; < +--+ < ip, we have 


TU=T (= (ox 5 ‘)) = X ou|Gulrum 
M tEGu M 
In the antisymmetric case, define a surjective linear map 
FLV) => Fp leissen] 
by 
Then D @ eg) = en Av NG, 

and extending by linearity. Since 

Tt = (—1)™ Tum 


we have 


Zs >» mux) 
D 


M 


Gyu|tum 


398 Advanced Linear Algebra 


Thus, in both cases, 


TU = X au|Gu|rum 


M 
where um = €;, 8: @ ĉi, with 7) < ig < --- < ip and 
TUM = €i, Vie V € or TUM = eh At A ei 
depending on whether v is symmetric or antisymmetric. However, in either case, 
the monomials tu jz are linearly independent for distinct multisets/sets M. 
Therefore, if rv = 0 then am|Gm| = 0 for all multisets/sets M. Hence, if 


char(F) = 0, then awm = 0 and so v = 0. This shows that the restricted maps 
T|srov) and T| 4ro(v) are isomorphisms. 


Theorem 14.16 Let V be a finite-dimensional vector space over a field F with 


char(F) = 0. 
1) The symmetric tensor space ST?(V) is isomorphic to the algebra 
Fple1, ... , €n] of homogeneous polynomials, via the isomorphism 


(> in, inhi, @ °° ® ei) = Ñ Oii (en Ve V e) 


2) For p< n, the antisymmetric tensor space AT?(V) is isomorphic to the 
algebra F, |e1,...,€n] of anticommutative homogeneous polynomials of 
degree p , via the isomorphism 


20a iiei CREER: ei) = oO cp (Ce, AeA €i,) o 


The direct sum 


ST(V) = an ST?(V) = Fle1,...,€n] 


p=0 


is called the symmetric tensor algebra of V and the direct sum 


AT(V) = ATV) x F` fei,..., €n] 


p=0 


is called the antisymmetric tensor algebra or the exterior algebra of V. These 
vector spaces are graded algebras, where the product is defined using the vector 
space isomorphisms described in the previous theorem to move the products of 
F|e1,... , €n] and F~[e1,...,en] to ST(V) and AT (V), respectively. 


Thus, restricting the domains of the maps 7 gives a nice description of the 
symmetric and antisymmetric tensor algebras, when char(F) = 0. However, 
there are many important fields, such as finite fields, that have nonzero 
characteristic. We can proceed in a different, albeit somewhat less appealing, 


Tensor Products 399 


manner that holds regardless of the characteristic of the base field. Namely, 
rather than restricting the domain of 7 in order to get an isomorphism, we can 
factor out by the kernel of 7. 


Consider a tensor 


v=) Su(v) =>. 5 at 


M M tEGum(v) 
Since 7 sends elements of different groups Gu(v) = {t1,...,t,} to different 
monomials in F,[e1,...,€n] or F; [e1,---;n], it follows that v € ker(r) if and 


only if T(S (v)) = 0 for all M, that is, if and only if 
ay, Tt} sp ssa ay, Tey, =0 


In the symmetric case, 7 is constant on G y (v) and so v € ker(r) if and only if 


Ot, +-+-+ay, =0 


In the antisymmetric case, tt; = (—1)’/7t,; where As (tı) =t; and so 
v € ker(r) if and only if 


(—1)" ay, + + (11) a = 0 
In both cases, we solve for a;, and substitute into Sm (v). In the symmetric case, 
Oy, = —O4, — +++ — Oy, 
and so 
Sm(v) = anti + +++ + Qe, te = an (t2 — tr) +--+ + au, (te — tr) 


In the antisymmetric case, 


and so 
Si (v) = anti +--+ + Ou, te 
= ap ((—1) ty = th) + + ap (1) h) 


Since t; € B, it follows that Sw (v) and therefore v, is in the span of tensors of 
the form A,(t)—t in the symmetric case and (—1)’A,(t)—t in the 
antisymmetric case, where o € Sp and t € B. 


Hence, in the symmetric case, 
ker(r) C Ip = (às(t)— t | t € B,o € Sp) 


and since T(A,(t) — t) = 0, it follows that ker(7) = Ip. In the antisymmetric 
case, 


400 Advanced Linear Algebra 


ker(r) C Ip = ((-1)°A,(t) — t | t E B,o € Sp) 
and since T((—1)”A,(t) — t) = 0, it follows that ker(r) = Jp. 


We now have quotient-space characterizations of the symmetric and 
antisymmetric tensor spaces that do not place any restriction on the 
characteristic of the base field. 


Theorem 14.17 Let V be a finite-dimensional vector space over a field F. 
1) The surjective linear map t:T?(V) — F,[e1, ... , en] defined by 


r( > Qi.. ipli D @ ei) = ` Qir. sip Oi V V €i 


has kernel 
Tp = (Ac(t) —t |t € B,o € Sp) 
and so 
TeV 
: ) xy Fy lei, s.s y Ezi] 
p 


The vector space T?(V)/I is also referred to as the symmetric tensor 
space of degree p of V. 
2) The surjective linear map T:T”(V) — F; [e1, --- , €n] defined by 


r(Y iiei 8 8 ei) = Sk dies Nv NG, 
has kernel 
Ip = ((-1)°Ao(t) —t | t E€ B,o € Sp) 
and so 


TV) 
Ip 


~ F; ler,- , €n] 


The vector space T” (V )/I is also referred to as the antisymmetric tensor 
space or exterior product space of degree p of V. O 


The isomorphic exterior spaces AT”(V) and T”(V)/I, are usually denoted by 
/\’V and the isomorphic exterior algebras AT(V) and T(V)/I are usually 
denoted by AV. 


Theorem 14.18 Let V be a vector space of dimension n. 


Tensor Products 401 


1) The dimension of the symmetric tensor space ST?(V) is equal to the 


number of monomials of degree p in the variables e1, ... , en and this is 
— 1 
dim(ST?(V,)) = (" ee ) 


2) The dimension of the exterior tensor space N” (V) is equal to the number of 
words of length p in ascending order over the alphabet E = {e),...,€n} 
and this is 


P 


aim") = (2) 


Proof. For part 1), the dimension is equal to the number of multisets of size p 
taken from an underlying set {e,,...,¢,} of size n. Such multisets correspond 
bijectively to the solutions, in nonnegative integers, of the equation 


Tite +HIn =p 


where x; is the multiplicity of e; in the multiset. To count the number of 
solutions, invent two symbols x and /. Then any solution x; = s; to the 
previous equation can be described by a sequence of z's and /'s consisting of sı 
x's followed by one /, followed by s2 x's and another /, and so on. For example, 
if p = 6 and n = 4, the solution 3 + 1 + 0 + 2 = 6 corresponds to the sequence 


Thus, the solutions correspond bijectively to sequences consisting of p x's and 
n — 1 /'s. To count the number of such sequences, note that such a sequence can 
be formed by considering n + p — 1 “blanks” and selecting p of these blanks for 
the x's. This can be done in 
(" +p- D 
P 


ways. O 
The Universal Property 


We defined tensor products through a universal property, which as we have seen 
is a powerful technique for determining the properties of tensor products. It is 
easy to show that the symmetric tensor spaces are universal for symmetric 
multilinear maps and the antisymmetric tensor spaces are universal for 
antisymmetric multilinear maps. 


Theorem 14.19 Let V be a finite-dimensional vector space with basis 


{ei, see en}. 
1) The pair (F,|£1,...,&nlļ,t) where HV? = Fp[£1,...;,8n] is the 
multilinear map defined by 


402 Advanced Linear Algebra 


Elei Ci) = €j, Views V Gi, 


is universal for symmetric p-linear maps with domain V “?; that is, for any 
symmetric p-linear map f:V*? — U where U is a vector space, there is a 
unique linear map T: Fp|£1, ..., £n] > U for which 


Ther Vie M ei) = flein tee sei) 


2) The pair (F; |1,.--,%n],t), where t:V“? > F; |a1,...,2n] is the 
multilinear map defined by 
bliye ei) = e} Ate A €i, 


is universal for antisymmetric p-linear maps with domain V “?; that is, for 
any antisymmetric p-linear map f:V*? — U where U is a vector space, 
there is a unique linear map T: F; |x1,-..;£n] — U for which 


T(ei, A+: A ey) = flens ses jé) 
Proof. For part 1), the property 
T(ei Vaze Ny €i,) = f(ei,; seey €i,) 


does indeed uniquely define a linear transformation 7, provided that it is well- 
defined. However, 

ej, V VG, = ej View V ej, 
if and only if the multisets {e;,,...,¢;,} and {e;,,...,e;,} are the same, which 


implies that f(¢;,,...,€:,) = f (€j, ---, €j), since f is symmetric. 


For part 2), since f is antisymmetric, it is completely determined by the fact that 
it is alternate and by its values on the basis of ascending words e;, A++- A €i, 
Accordingly, the condition 


Thei AmA ei) = Flens TET ei) 
uniquely defines a linear transformation 7.0 
The Symmetrization Map 


When char(F’) = 0, we can define a linear map S:T?(V) —> ST?(V), called 
the symmetrization map, by 


Since à- Às = Arc, We have 


Tensor Products 403 


d,(St) = a> AAt = => àrt = => àt = St 


o€Sp o€Sp o€S, 


and so St is symmetric. The reason for the factor 1/p! is that if v is a symmetric 
tensor, then A,v = v and so 


1 1 
Su= FD A= =Y 


that is, the symmetrization map fixes all symmetric tensors. It follows that for 
any tensor t € T?(V), 


S?t = St 


Thus, S is idempotent and is therefore the projection map of T?(V) onto 
im(S) = ST?(V). 


The Determinant 


The universal property for antisymmetric multilinear maps has the following 
corollary. 


Corollary 14.20 Let V be a vector space of dimension n over a field F. Let 


E = (e1,...,€n) be an ordered basis for V. Then there is at most one 
antisymmetric n-linear form d: V*" — F for which 
d(e1,.--,€n) =1 


Proof. According to the universal property for antisymmetric n-linear forms, for 
every antisymmetric n-linear form f:V*" — F satisfying f(e1,...,e,) = 1, 
there is a unique linear map 7p: \"V — F for which 


Tp(e1r A+++ Aen) = f(é1,---,€n) =1 


But A”V has dimension 1 and so there is only one linear map o: \"V —> F 
with o(e; A++- ^en) =1. Therefore, if f and g are two such forms, then 
Tf = 0 = Tg, from which it follows that 


f=tpot=t0t=g O 


We now wish to construct an antisymmetric form d: V*” — F, which is unique 
by the previous theorem. Let B be a basis for V. For any v € V, write [v]g,; for 
the ith coordinate of the coordinate matrix [v],. Thus, 


v= a [v]g ici 


For clarity, and since we will not change the basis, let us write [v]; for [v]g.:. 


404 Advanced Linear Algebra 


Consider the map d: V *” — F defined by 
d(v,.--,Un) = XO (1 foror lvn]on 


oES, 


Then d is multilinear since 


d(av; + buy, ..., Un) = 5 (—1) [avı + burilor: Wn]on 


o€S;, 
= SO ($1)? (alerjor + bfuil): -lenlo 
oéSi, 
= aX (=1) [vijot- : ‘[Unlon 
oéS, 
+ D (=1)° [ui jot: g Walon 


and similarly for any coordinate position. 


To see that d is alternating, and therefore antisymmetric since char( F) 4 2, 
suppose for instance that vı = vz. For any permutation o € Sp, let 


a’ = (ol o2)o 
Then o'x = ox for x 4 1,2 and 
al=o2 and o'2=cl1 


Hence, o' # a. Also, since (o')' = o, if the sets {0,o'} and {p, p'} intersect, 
then they are identical. Thus, the distinct sets {o, o’} form a partition of S,,. It 
follows that 


d(v, U1, U3, 05+ sün) = 5 (=1) fvi] vilo 5 ‘lUnlon 


ocS, 
= 5 [Dr torlertoiloz tale + (-1)" loreto lonon 
pairs {0,0'} 
But 
[vilo [vilor = [Mio [Vi]o2 
and since (—1)” = —(—1)”, the sum of the two terms involving the pair {o, 0} 
is 0. Hence, d(v,,v1,...,Upn) = 0. A similar argument holds for any coordinate 


pair. 


Tensor Products 405 


Finally, 
d(ei, us yen) = 5 (=1) eilor: : ‘lenlon 
oéS, 
= D (—1)761,61: j ‘Ên on 
oESn 


Thus, the map d is the unique antisymmetric n-linear form on V*" for which 
d(e1,..-,€n) = L, 


Under the ordered basis € = (e1,..., €n), we can view V as the space F” of 
coordinate vectors and view V *” as the space M, (F) ofn x n matrices, via the 
isomorphism 


[vi] S [vn] 
(Ui, ...; Un) > : : 
[vilna SER [Un|n 


where all coordinate matrices are with respect to £. 


With this viewpoint, d becomes an antisymmetric n-form on the columns of a 
matrix A = (a;,;) given by 


d(A) = 5 (—1)V a o: “ün on 


oESy 
This is called the determinant of the matrix A. 
Properties of the Determinant 


Let us explore some of the properties of the determinant function. 
Theorem 14.21 Jf A € M,,(F), then 
d(A) = d(A‘) 
Proof. We have 
d(A) = X (=1)’are1"+ “Anon 


oéS, 


II 
~ 
| 
ja 
s 

a 
Q 
S 
an 
H| 
S 
q 
Š 
a 


as desired. O 


406 Advanced Linear Algebra 


Theorem 14.22 If A, B € M,,(F), then 
d(AB) = d(A)d(B) 
Proof. Consider the map f4: M, (F) — F defined by 
fa(X) = d(AX) 
We can consider f4 as a function on the columns of X and think of it as a 


composition 


far (X,...,X) S (AXM,... AX) F d(AX) 


? 


Each step in this map is multilinear and so f4 is multilinear. It is also clear that 
fa is antisymmetric and so f4 is a scalar multiple of the determinant function, 
say f4(X) = cd(X). Then 


d(AX) = fa(X) = cd(X) 
Setting X = I, gives d(A) = c and so 
d(AX) = d(A)d(X) 
as desired. O 
Theorem 14.23 A matrix A € M, (F) is invertible if and only if d(A) 4 0. 
Proof. If P € M,,(F) is invertible, then PP~' = I, and so 
d(P)d(P™) =1 


which shows that d(P) 40 and d(P~') =1/d(P). Conversely, any matrix 
A € M,,(F) is equivalent to a diagonal matrix 


A= PDQ 


where P and Q are invertible and D is diagonal with 1's and O's on the main 
diagonal. Hence, 


d(A) = d(P)d(D)d(Q) 


and so if d(A) £0, then d(D) 4 0, which happens if and only if D = In, 
whence A is invertible.O 


Exercises 


1. Show that if 7: W — X is a linear map and b:U x V — W is bilinear, 
then 70b:U x V — X is bilinear. 

2. Show that the only map that is both linear and n-linear (for n > 2) is the 
zero map. 

3. Find an example of a bilinear map 7:V x V — W whose image 
im(T) = {r(u,v) | u,v € V} is not a subspace of W. 


D 


Tensor Products 407 


Let B = {u; | i € I} be a basis for U and let C = {v; | j € J} be a basis 
for V. Show that the set 


D= {u 89v ]|iEcI, je J} 


is a basis for U & V by showing that it is linearly independent and spans. 
Prove that the following property of a pair (W,g:U x V — W) with g 
bilinear characterizes the tensor product (U @V,t:U x V — U & V) up 
to isomorphism, and thus could have been used as the definition of tensor 
product: For a pair (W,g:U x V — W) with g bilinear if {u;} is a basis 
for U and {v;} is a basis for V, then {g(u;, vj) } is a basis for W. 
Prove tht U @V x V 8U. 

Let X and Y be nonempty sets. Use the universal property of tensor 
products to prove that Fxyy ~% Fx 8 Fy. 

Let u,u' EU and v,v' €V. Assuming that u&v#0, show that 
u v = u' Ov ifand only ifu’ = ru and v' =r~'v, for r Æ 0. 

Let B = {b;} be a basis for U and C = {c;} be a basis for V. Show that any 
function f:BxC—W can be extended to a linear function 
f:U ®V — W. Deduce that the function f can be extended in a unique 
way to a bilinear map F: U x V — W. Show that all bilinear maps are 
obtained in this way. 


. Let S1, S2 be subspaces of U. Show that 


(S, @V)N (SEV) & (S1 N S2) @V 


. Le SCU and TCV be subspaces of vector spaces U and V, 


respectively. Show that 
(SOV)N(UST) x SQT 


. Let S1, S2 CU and Ti, To C V be subspaces of U and V, respectively. 


Show that 
(Si ed) Tı) N (S2 & Tə) z (Si N S2) Q (Tı Q Tə) 


. Find an example of two vector spaces U and V and a nonzero vector 


x E U QV that has at least two distinct (not including order of the terms) 
representations of the form 


n 
C= > Uj; ® Vi 
i=1 


where the u;'s are linearly independent and so are the 1's. 


. Let ¿x denote the identity operator on a vector space X. Prove that 


ly © lw = bvew- 


408 


15. 


16. 


17. 


18. 


19. 


Advanced Linear Algebra 


Suppose that 71:U >V, m: V => W and oi: U' > Vg, ox: Vg > W'. 
Prove that 


(T0 T1) © (02 0 01) = (T © 02) 0 (T1 © 01) 


Connect the two approaches to extending the base field of an F-space V to 
K (at least in the finite-dimensional case) by showing that 
F" pK & (K)". 

Prove that in a tensor product U @ U for which dim(U) > 2 not all vectors 
have the form u & v for some u,v € U. Hint: Suppose that u,v € U are 
linearly independent and consider u 8 v + v 8 u. 

Prove that for the block matrix 


TE 
0 block 


we have d(M) = d(A)d(C). 
Let A,B € M,(F). Prove that if either A or B is invertible, then the 
matrices A + aB are invertible except for a finite number of a's. 


The Tensor Product of Matrices 


20. 


21. 
22: 
23. 
24. 


25: 
26. 


Let A = (a; j) be the matrix of a linear operator r € L(V) with respect to 
the ordered basis A = (u1, ..., Un). Let B = (b; j) be the matrix of a linear 
operator o € L(V) with respect to the ordered basis B = (v1, ..., Um). 
Consider the ordered basis C = (u; ® vj) ordered lexicographically; that is 
Uj Vj < Ue @ vp if i< £L or i= £l and j < k. Show that the matrix of 
T Q o with respect to C is 


aB a12 B = nB 

ag 1B az 2B Eas a2, B 
A8 B= , i K 

an1 B an2 B te annB 


block 


This matrix is called the tensor product, Kronecker product or direct 
product of the matrix A with the matrix B. 

Show that the tensor product is not, in general, commutative. 

Show that the tensor product A ® B is bilinear in both A and B. 

Show that A & B = 0 if and only if A = 0 or B = 0. 

Show that 

a) (A@BY = A & Bt 

b) (A® B)* = A* 8 B* (when F = ©) 

Show that if u, v € F”, then (as row vectors) u'v = ut & v. 

Suppose that Amn,Bpg,Cn,, and D,, are matrices of the given sizes. 
Prove that 


(A ® B)(C 8 D) = (AC) ® (BD) 


Discuss the case k = r = 1. 


27. 


28. 
29. 


30. 


Tensor Products 409 


Prove that if A and B are nonsingular, then so is A & B and 
(A@B)'=A'@B! 


Prove that tr(A & B) = tr(A) - tr(B). 

Suppose that F is algebraically closed. Prove that if A has eigenvalues 
Ay,---,An and B has eigenvalues [11,...,/4m, both lists including 
multiplicity, then A®B has eigenvalues {A;u,; |i <n,j7 <m}, again 
counting multiplicity. 

Prove that det( An n © Bm,m) = (det( An, n))” (det(Bmm))”. 


Chapter 15 
Positive Solutions to Linear Systems: 
Convexity and Separation 


It is of interest to determine conditions that guarantee the existence of positive 
solutions to homogeneous systems of linear equations 


Ax =0 
where A E€ M,,,,(R). 


Definition Let v = (a1, ... , an) E€ R”. 
1) vis nonnegative, written v > 0, if 


ai > O forall i 


(The term positive is also used for this property.) The set of all nonnegative 
vectors in R” is the nonnegative orthant in R”. 
2) vis strictly positive, written v > 0, ifv is nonnegative but not 0, that is, if 


ai > 0 for all i and a; > 0 for some j 


The set IR". of all strictly positive vectors in R” is the strictly positive 
orthant in R”. 
3) vis strongly positive, written v > 0, if 


ai > O for alli 


The set RR", of all strongly positive vectors in R” is the strongly positive 
orthant in R".O 


We are interested in conditions under which the system Ax = 0 has strictly 
positive or strongly positive solutions. Since the strictly and strongly positive 
orthants in R” are not subspaces of R”, it is difficult to use strictly linear 
methods in studying this issue: we must also use geometric methods, in 
particular, methods of convexity. 


412 Advanced Linear Algebra 


Let us pause briefly to consider an important application of strictly positive 


solutions to a system Ax = 0. If X = (21,...,2,) is a strictly positive solution 
to this system, then so is the vector 
1 1 
IT = SSB eg bn) = (Mice 
Sr; Sa 1; , n) ( 1; 7 n) 


which is a probability distribution, that is, 0 < m; < 1 and 5°; = 1. Moreover, 
if X is a strongly positive solution, then II has the property that each probability 
is positive. 


Now, the product AII is the expected value of the columns of A with respect to 
the probability distribution II. Hence, Ax = 0 has a strictly (strongly) positive 
solution if and only if there is a strictly (strongly) positive probability 
distribution for which the columns of A have expected value 0. If each column 
of A represents the possible payoffs from a game of chance, where each row is a 
different possible outcome of the game, then the game is fair when the expected 
value of the columns is 0. Thus, Ax = 0 has a strictly (strongly) positive 
solution X if and only if the game with payoffs A and probabilities X is fair. 


As another (related) example, in discrete option pricing models of mathematical 
finance, the absence of arbitrage opportunities in the model is equivalent to the 
fact that a certain vector describing the gains in a portfolio does not intersect the 
strictly positive orthant in R”. As we will see in this chapter, this is equivalent 
to the existence of a strongly positive solution to a homogeneous system of 
equations. This solution, when normalized to a probability distribution, is called 
a martingale measure. 


Of course, the equation Ax = 0 has a strictly positive solution if and only if 
ker( A) contains a strictly positive vector, that is, if and only if 


ker( A) = RowSpace(A)* 


meets the strictly positive orthant in R”. Thus, we wish to characterize the 
subspaces S of R” for which S+ meets the strictly positive orthant in R”, in 
symbols, 


SAR? #90 


for these are precisely the row spaces of the matrices A for which Ax = 0 has a 
strictly positive solution. A similar statement holds for strongly positive 
solutions. 


Looking at the real plane R?, we can divine the answer with a picture. A one- 
dimensional subspace S of R? has the property that its orthogonal complement 
SŁ meets the strictly positive orthant (quadrant) in R? if and only if S is the z- 
axis, the y-axis or a line with negative slope. For the case of the strongly 


Positive Solutions to Linear Systems: Convexity and Separation 413 


positive orthant, S must have negative slope. Our task is to generalize this to 
R”. 
This will lead us to the following results, which are quite intuitive in R? and R: 
SNAR, #0 & SnRt=G (15.1) 
and 
SSNAR 40 = SNAR =0 (15.2) 


Let us translate these statements into the language of the matrix equation 
Ax = 0. If S = RowSpace(A), then S+ = ker( A) and so we have 


ker(A) OR}, #0 <= RowSpace(A)N Ri = 4 


and 
ker(A) NAR}; #0 & RowSpace(A)NR', =0 
Now 
RowSpace(A) OR} = {vA | vA > 0} 
and 


RowSpace(A) OR}, = {vA | vA > 0} 
and so these statements become 
Ax = 0 has a strongly positive solution < {vA|vA>0O0}=0 
and 


Ax = 0 has a strictly positive solution <= {vA|vA>0}=0 


We can rephrase these results in the form of a theorem of the alternative, that 
is, a theorem that says that exactly one of two conditions holds. 


Theorem 15.1 Let A E M,,,,(R). 
1) Exactly one of the following holds: 
a) Au = 0 for some strongly positive u € R”. 
b) vA> Qfor some v € R”. 
2) Exactly one of the following holds: 
a) Au = 0 for some strictly positive u € R”. 
b) vA > 0 for some v € R”. O 


Before proving Theorem 15.1, we require some background. 
Convex, Closed and Compact Sets 


We shall need the following concepts. 


414 Advanced Linear Algebra 


Definition 
1) Let x,...,x%_% € R”. Any linear combination of the form 


tizi +--+ bith 


where 0 < t; < land tı +---+t, = 1 is called a convex combination of 
the vectors £1,..., Lk. 

2) A subset X C R” is convex if whenever x,y € X, then the line segment 
between x and y also lies in X, in symbols, 


{tz+(1-tyl0<t<1}EX 


3) A subset X C R” is closed if whenever (xn) is a convergent sequence of 
elements of X, then the limit is also in X. 

4) A subset X C R” is compact if it is both closed and bounded. 

5) A subset X C R” is a cone if x € X implies that ax € X for all a > 0.0 


We will also have need of the following facts from analysis. 


1) A continuous function that is defined on a compact set X in R” takes on 
maximum and minimum values at some points within the set X. 

2) A subset X of R” is compact if and only if every sequence in X has a 
subsequence that converges to a point in X. 


Theorem 15.2 Let X and Y be subsets of R". Define 
X+Y={a+b|aex,beY} 


1) If X and Y are convex, then so is X +Y 

2) If X is compact and Y is closed, then X + Y is closed. 

Proof. For 1), let zo + yo and x; + yı be in X + Y. The line segment between 
these two points is 


t(ao + yo) + (1 — #)(a1 + y1) 
= [tzo + (1 — t)zı] + [ty + (1 -t)nlex+Y 


for 0 < t < 1 and so X + Y is convex. 


For part 2), let x, + Yn be a convergent sequence in X + Y. Suppose that 
Ln +Yn — z. We must show that z € X +Y. Since x, is a sequence in the 
compact set X, it has a convergent subsequence xn, whose limit x lies in X. 
Since £n, + Yn, — Z and Tn, — x we can conclude that yn, — z — x. Since Y 
is closed, it follows that z — x € Y and so z = z + (z — x) € X +Y.O 


Convex Hulls 


We will also have use for the notion of the smallest convex set containing a 
given set. 


Positive Solutions to Linear Systems: Convexity and Separation 415 


Definition The convex hull of a set S = {x1,...,2%} of vectors in R” is the 
smallest convex set in R” that contains S. We will denote the convex hull of S 


by C(S).0 
Here is a characterization of convex hulls. 


Theorem 15.3 Let S = {x1,..., £k} be a set of vectors in R”. Then the convex 
hull C(S) is the set A of all convex combinations of vectors in S, that is, 


CSSA: {ti +--+ tame [OSH <1, 04 =1} 


Proof. Clearly, if D is a convex set that contains S, then D also contains A. 
Hence A C C(S). To prove the reverse inclusion, we need only show that A is 
convex, since then S C A implies that C(S) C A. So let 


X = tix +e + tkk 
Y = sızı +: + Spry 

be in A. Ifa +b = 1 and0 < a,b < 1 then 

aX + bY = a(tı£ı +: + tkEk) + b(s1z1 + + spp) 
= (atı + bs;)a, +--+ + (atk + bSk)Ek 
But this is also a convex combination of the vectors in S, because 
0 < at; + bs; < (a+b) - max(s;,t;) = max(s;,t;) < 1 
and 


k 


k k 
(at; + bs;) = að ti + bY si =a+ b=] 
=1 i 


i=l i=l 


(3 


Thus, aX + bY € A.O 


Theorem 15.4 The convex hull C(S) of a finite set S = {x1, ... , £k} of vectors 
in R” is a compact set. 
Proof. The set 


D= | lenssssta) |O<t< Lesi 


is closed and bounded in R” and therefore compact. Define a function 
f: D — R” as follows: Ift = (ti, ..., tp), then 


f(t) = tizi +: + te, 


To see that f is continuous, let s = (s1, ... , sp) and let M = max(||x;l||). Given 
€ > 0, if ||s — t|| < €/kM then 


416 Advanced Linear Algebra 


€ 
7-8) < — t|| < — 
=i] < leili < 


and so 


lf (s) — FOI] = (sr — ta) 21 + +> + (sk — te) aI 
< |si = ti|lzil] +--+ + |se — telll vel 
<kM||s — t|| 


=e 
Finally, since f(D) = C(S), it follows that C(S) is compact.O 
Linear and Affine Hyperplanes 


We next discuss hyperplanes in R”. A linear hyperplane in R” is an (n — 1)- 
dimensional subspace of R”. As such, it is the solution set of a linear equation 
of the form 


aX, +++ + anin = 0 
or 
(N,x) =0 


where N = (a,,...,@,) is nonzero and x = (£1,..., £n). Geometrically 
speaking, this is the set of all vectors in R” that are perpendicular (normal) to 
the vector N. 


An affine hyperplane, or just hyperplane, in R” is a linear hyperplane that has 
been translated by a vector. Thus, it is the solution set to an equation of the form 


ay (#1 — bi) + +++ + Gn(2n — bn) = 0 
or equivalently, 
(N,«x) =b 
where b = a,b, +-+ + anbn. We denote this hyperplane by 
H(N,b) = {x E R” | (N, x) =b} 
Note that the hyperplane 
HN, |INI’) = {2 E€ R" | (N, 2) = ||N|"} 


contains the point N, which is the point of H(N, ||N'||?) closest to the origin, 
since Cauchy's inequality gives 


INI? = (NY, 2) < IN IIlle| 


and so ||N|| < ||2|| for all £ € H(N,||N||?). Moreover, we leave it as an 


Positive Solutions to Linear Systems: Convexity and Separation 417 


exercise to show that any hyperplane has the form H(N,||N||?) for an 
appropriate vector NV. 
A hyperplane defines two closed half-spaces 


H,(N,b) = {x € R” | (N,x) > b} 
H_(N,b) = {x € R” | (N,x) < b} 


and two disjoint open half-spaces 


HE (N,b) = {x € R” | (N, x) >b} 
H? (N,b) = {x € R” | (N, 2) < b} 


It is clear that 
H (N, b) OH-(N,b) = H(N,b) 
and that the sets H3 (N, b), H? (N, b) and H(N,b) form a partition of R”. 


If N € R” and X C R”, we let 
(N, X) = {(N,2) | a € X} 
and write 
(N,X) <b 


to denote the fact that (N, x} < b forall x € X. 


Definition Two subsets X and Y of R" are strictly separated by a hyperplane 
H(N,b) if X lies in one open half-space determined by H(N,b) and Y lies in 
the other open half-space; in symbols, one of the following holds: 

D (N,X)<b<(N,Y) 

2) (N,Y)<b<(N,X) Oo 


Note that 1) holds for N and b if and only if 2) holds for —N and —b, and so we 
need only consider one of the conditions to demonstrate that two sets X and Y 
are not strictly separated. Specifically, if 1) fails for all N and b, then the 
condition 


(-N,Y) < —b < (-N,X) 


also fails for all N and b and so 2) also fails for all N and b, whence X and Y 
are not strictly separated. 


Definition Two subsets X and Y of R” are strongly separated by a hyperplane 
H(N, b) if there is an e > 0 for which one of the following holds: 

D (N,X)<b-e<b+e<(N,Y) 

2) (N,Y)<b-e<b+e<(N,X) o 


418 Advanced Linear Algebra 


As before, we need only consider one of the conditions to show that two sets are 
not strongly separated. Note also that if 


(N,2)<r<(N,Y) 


for r € R, then x € R” and Y C R” are stongly separated by the hyperplane 


n(x, 2) 


Separation 


Now that we have the preliminaries out of the way, we can get down to some 
theorems. The first is a well-known separation theorem that is the basis for 
many other separation theorems. It says that if a closed convex set C C R” does 
not contain a vector b, then C can be strongly separated from b. 


Theorem 15.5 Let C be a closed convex subset of R”. 
1) C contains a unique vector N of minimum norm, that is, there is a unique 
vector N € C for which 


INI] < [lal] 


forallx EC, x £N. 
2) Ifb ¢ C, then C lies in the closed half-space 


H4(N, INI? +(N,b)) 
that is, 
(N, C) > IIN]? + (N,b) > (N,b) 
where N is the unique vector of minimum norm in the closed convex set 
C-—b={c-b|ceC} 


Hence, C and b are strongly separated by the hyperplane 
N|? +2(N,b 
n(x. la :) 


Proof. For part 1), if 0 € C then this is the unique vector of minimum norm, so 
we may assume that 0 ¢ C. It follows that no two distinct elements of C can be 
negative scalar multiples of each other. For if x and rx were in C, where r < 0, 
then taking t = —r/(1 — r) gives 


0O=ta+(1—-tyraeCc 


which is false. 


Positive Solutions to Linear Systems: Convexity and Separation 419 


We first show that C contains a vector N of minimum norm. Recall that the 
Euclidean norm (distance) is a continuous function. Although C need not be 
compact, if we choose a real number s such that the closed ball 


B,(0) = {2 € R” | [lz|| < s} 


intersects C’, then that intersection C” = C N B,(0) is both closed and bounded 
and so is compact. The norm function therefore achieves its minimum on C”, 
say at the point N € C” C C. It is clear that if ||v|| < || N || for some v € C, then 
v € C", in contradiction to the minimality of N. Hence, N is a vector of 
minimum norm in C. 


We establish uniqueness first for closed line segments [u,v] in R”. If u = rv 
where r > 0, then 


|tu + (1 — t)ol] = [tr + (1 — lloll 


is smallest when t = 0 for r > 1 and t = 1 for r < 1. Assume that u and v are 
not scalar multiples of each other and suppose that x Æ y in [u,v] have 
minimum norm d > 0. If z = (a + y)/2 then since x and y are also not scalar 
multiples of each other, the Cauchy-Schwarz inequality is strict and so 


1 
lel? = lle + yl? 
1 
= zll? + 22,9) + llyl?) 
Leas 
< 5(@ + lellu) 
=? 


which contradicts the minimality of d. Thus, [u,v] has a unique point of 
minimum norm. 


Finally, if z € C also has minimum norm, then N and x are points of minimum 
norm in the line segment [N,a] C C and so x = N. Hence, C has a unique 
element of minimum norm. 


For part 2), suppose the result is true when 0 ¢ C. Then b ¢ C implies that 
0 ¢ C — band so if N € C — b has smallest norm, then 


(N,C —b) > |N]? >0 
Therefore, 
(N,C) > |N]? + (N,b) > (N, 6) 


and so C and b are strongly separated by the hyperplane 


420 Advanced Linear Algebra 


HN, (1/2)||NII? + (N, b)) 
Thus, we need only prove part 2) for b = 0, that is, we need only prove that 
(N,C) > IIN]? 
If there is a nonzero x € C for which 
(N, a) < |N]? 
then ||N]|| < ||æ|| and 
(N, 2) = |N]? -€ 


for some e€ > 0. Then for the open line segment f(t) = tN + (1 -— t)x with 
0<t< 1, we have 


IFIP = IEN + 1 tal? 
= PIN? +0 — £)(N, 2) + (1 -¢)? Jal? 
< (2t — PINI? — 2¢(1 — the + (1-2)? [|| 
2 2 
= (—||N|? + 2e + Iæ’) + 2N]? — € = [lell?)¢ + [le 
Let p(t) denote the final expression above, which is a quadratic in t. It is easy to 


see that p(t) has its minimum at the interior point of the line segment |N, zx] 
corresponding to 


_ INI? +e + (lel? 
-IIN IP + 2e + læ]? 


to 
and so || f(to)|| < p(to) < p(1) = || ||, which is a contradiction. O 


The next result brings us closer to our goal by replacing a single vector b with a 
subspace § disjoint from C. However, we must also require that C be bounded, 
and therefore compact. 


Theorem 15.6 Let C be a compact convex subset of R” and let S be a subspace 
of R” such that C A S = (. Then there exists a nonzero N € S+ such that 


(N, x) > |N]? > 0 


for all x € C. Hence, the hyperplane H(N, || N||?/2) strongly separates S and 
C. 

Proof. Theorem 15.2 implies that the set S+C is closed and convex. 
Furthermore, C N S = Ø implies that 0 S + C and so Theorem 15.5 implies 
that S + C can be strongly separated from the origin. Hence, there is a nonzero 
N € R” such that 


Positive Solutions to Linear Systems: Convexity and Separation 421 


(Ns) +(N,c) =(N,s +c) > IN|’ 


for all s € S and c € C. But if (N, s) 4 0 for some s € S, then we can replace 
s by an appropriate scalar multiple of s in order to make the left side of this 
inequality negative, which is impossible. Hence, (N, s) = 0 for all s € S, that 
is, N € S+ and 


(N,C) > INIP o 
We can now prove (15.1) and (15.2). 


Theorem 15.7 Let S be a subspace of R”. 
1) SAOR? =0 ifand only if S OR", #4 

2) SOR", =O ifand only if S+ OR? £0 

Proof. In both cases, one direction is easy. It is clear that there cannot exist 
vectors u E€ Ri, and ve Ri that are orthogonal. Hence, SMR and 
S+R"%, cannot both be nonempty and so S+MR?, 40 implies 
SOR? =9. Also, SAR%, and S+ NR? cannot both be nonempty and so 
S+ OR} Æ 0 implies that SO RY, = 0. 


For the converse in part 1), to prove that 
SNAR? =0 > SNAR, #6 


a good candidate for an element of S+ A R?, would be a normal to a 
hyperplane that separates S from a subset of R. Note that our separation 
theorems do not allow us to separate S from R7, becaise R? is not compact. So 


consider instead the convex hull A of the ideri basis vectors €1,..., €n in 
R”? . 
fis 


A = {trey tees + tr€n |O< ti < 1, Et; = 1} 


which is compact. Moreover, A C R} implies that A N S = Ø and so Theorem 
15.6 implies that there is a nonzero vector N = (a1,...,@,) E€ S+ such that 


(N,6) > IIN]? 
for all 6 € A. Taking 6 = €; gives 
a; = (N, 6) > |N]? > 0 


and so N € S+OR%,, 


which is therefore nonempty. 

To prove part 2), again we note that there cannot exist orthogonal vectors 
u € R? , and v € R? and so S NR? , and S+ N R? cannot both be nonempty. 
Thus, S+ N R? # Ø implies that S OR? , = 0. 


422 Advanced Linear Algebra 


To finish the proof of part 2), we must prove that 
SNAR, =0 > SARE 


Let B = {B,,..., By} be a basis for S. Then N = (a1,..., an) € S+ if and 
only if N L B; for all i. In matrix terms, if 


M = (mij) = (Bi | B2 |---| Be) 
has rows R1, ..., Rn, then N € S+ if and only if NM = 0, that is, 
aR, +e + OnRy =0 


Now, S+ contains a strictly positive vector N = (a1, ... , an) if and only if this 
equation holds, where a; > 0 for all ¿ and a; > 0 for some j. Moreover, we may 
assume without loss of generality that Na; = 1, or equivalently, that O is in the 
convex hull C of the row space of M. Hence, 


SOR #0 = oec 


Thus, we wish to prove that 

SNAR}, =0 => O€C 
or equivalently, 

0¢gC => SNAR}, #0 


Now we have something to separate. Since C is closed and convex, Theorem 
15.5 implies that there is a nonzero vector B = (b1, ... , by) € R? for which 


(B,C) > |B|? > 0 
Consider the vector 
v =b Bi +- +.B, ES 
The ¿th coordinate of v is 
bimi +++: + bemig = (B, Ri) > IB]? >0 


and so v is strongly positive. Hence, v € S ANR},, which is therefore 
nonempty. O 


Inhomogeneous Systems 
We now turn our attention to inhomogeneous systems 
Az =b 


The following lemma is required. 


Positive Solutions to Linear Systems: Convexity and Separation 423 


Lemma 15.8 Let A E€ Mm n(R). Then the set 
C= {Ay| ye R",y > 0} 


is a closed, convex cone. 
Proof. We leave it as an exercise to prove that C is a convex cone and omit the 
proof that C is closed.O 


Theorem 15.9 (Farkas's lemma) Let A E M,,,,(R) and let bE R™ be 
nonzero. Then exactly one of the following holds: 

1) There is a strictly positive solution u € R” to the system Ax = b. 

2) There is a vector v € R” for which vA < 0 and (v, b) > 0. 

Proof. Suppose first that 1) holds. If 2) also holds, then 


(vA)u = v(Au) = (u,b) > 0 
However, vA < 0 and u > 0 imply that (vA)u < 0. This contradiction implies 
that 2) cannot hold. 
Assume now that 1) fails to hold. By Lemma 15.8, the set 
C = {Ay|y E R”, y> 0} CR” 


is closed and convex. The fact that 1) fails to hold is equivalent to b ¢ C. 
Hence, there is a hyperplane that strongly separates b and C. All we require is 
that b and C be strictly separated, that is, for some a € R and v € R”, 


(uv, x} <a < (u,b) for alla € C 


Since 0 € C, it follows that a > 0 and so (v,b) > 0. Also, the first inequality is 
equivalent to (v, Ay) < a, that is, 


(Av, y) <a 


for all y € R”,y > 0. We claim that this implies that A‘v cannot have any 
positive coordinates and thus vA <0. For if the ith coordinate (A‘v); is 
positive, then taking y = Ae; for \ > 0 we get 


(Atv); <a 
which does not hold for large A. Thus, 2) holds. 


In the exercises, we ask the reader to show that the previous result cannot be 
improved by replacing vA < 0 in statement 2) with vA < 0. 


Exercises 


1. Show that any hyperplane has the form H(N,||N||?) for an appropriate 
vector N. 


424 


Advanced Linear Algebra 


If A is an m xn matrix prove that the set {Ax |a€R",x>0} is a 
convex cone in R™. 

If A and B are strictly separated subsets of R” and if A is finite, prove that 
A and B are strongly separated as well. 

Let V be a vector space over a field F with char(F) # 2. Show that a 
subset X of V is closed under the taking of convex combinations of any 
two of its points if and only if X is closed under the taking of arbitrary 
convex combinations, that is, for all n > 1, 


n n 
L1,.--,%n E X, = Llo<r< 1> > riz EX 
i=1 i 


= i=1 


Explain why an (n — 1)-dimensional subspace of R” is the solution set of a 
linear equation of the form a1zı +--+: + an£n = 0. 
Show that 


H(N,b) NH_(N,b) = H(N, db) 
and that H9 (NV, b), H? (N,b) and H(N, b) are pairwise disjoint and 
HS(N,b)U H? (N,b) U HCN, b) = R” 


A function T:R” — R” is affine if it has the form T(v) = rv +b for 
b € R”, where r € L(R”, R™). Prove that if C C R” is convex, then so is 
T(C) C R”. 

Find a cone in R? that is not convex. Prove that a subset X of R” is a 
convex cone if and only if x,y € X implies that Ax + uy € X for all 
A, u> 0. 

Prove that the convex hull of a set {21,...,2,} in R” is bounded, without 
using the fact that it is compact. 


. Suppose that a vector x € R” has two distinct representations as convex 


combinations of the vectors v1,...,Un. Prove that the vectors 
V2 — V1, ... , Un — V1 are linearly dependent. 


. Suppose that C is a nonempty convex subset of R” and that H(N,b) is a 


hyperplane disjoint from C’. Prove that C lies in one of the open half-spaces 
determined by H(N, b). 


. Prove that the conclusion of Theorem 15.6 may fail if we assume only that 


C is closed and convex. 


. Find two nonempty convex subsets of R? that are strictly separated but not 


strongly separated. 


. Prove that X and Y are strongly separated by H(N , b) if and only if 


(N, x’) >b forall z’ € Xe and (N, y’) < b forall y’ € Y. 


where X. = X + eB(0, 1) and Y, = Y + eB(0, 1) and where B(0, 1) is the 
closed unit ball. 


Positive Solutions to Linear Systems: Convexity and Separation 425 


15. Show that Farkas's lemma cannot be improved by replacing vA < 0 in 
statement 2) with vA <0. Hint: A nice counterexample exists for 
m= 2,n = 3. 


Chapter 16 
Affine Geometry 


In this chapter, we will study the geometry of a finite-dimensional vector space 
V, along with its structure-preserving maps. Throughout this chapter, all vector 
spaces are assumed to be finite-dimensional. 


Affine Geometry 


The cosets of a quotient space have a special geometric name. 


Definition Let S be a subspace of a vector space V. The coset 
v+S={v+s|seS} 


is called a flat in V with base S and flat representative v. We also refer to 
v+ S as a translate of S. The set A(V) of all flats in V is called the affine 
geometry of V. The dimension dim(A(V)) of A(V) is defined to be dim(V).O 


While a flat may have many flat representatives, it only has one base since 
x+S=y+T implies that cey+T and so x+S=y+T=24+T, 
whence S = T. 


Definition The dimension of a flat x + S is dim(S). A flat of dimension k is 
called a k-flat. A 0-flat is a point, a 1-flat is a line and a 2-flat is a plane. A flat 
of dimension dim(A(V )) — 1 is called a hyperplane. O 


Definition Two flats X = x + S and Y =y+T are said to be parallel if 
S CT orT CS. This is denoted by X || Y.U 


We will denote subspaces of V by the letters $,7,... and flats in V by 
AY geass 


Here are some of the basic intersection properties of flats. 


428 Advanced Linear Algebra 


Theorem 16.1 Let S and T be subspaces of V and let X =x+S and 
Y = y +T be flats in V. 
1) The following are equivalent: 
a) some translate of X isin Y: w+ X CY for some w E€ V 
b) some translate of S is in T: v + S C T for some v € V 
c) SCT 
2) The following are equivalent: 
a) X and ¥Y are translates: w + X = Y for some w E€ V 
b) Sand T are translates: v + S = T for some v € V 
c) S=T 
3) XOY#0, SCTSXCY 
4 XNYA40,S=TSx=Y 
5) IfX || Y thn xX CY,Y CXorXnY =Í 
6) X || Y ifand only if some translation of one of these flats is contained in 
the other. 
Proof. If 1a) holds, then —y + w +x + S C T and so 1b) holds. If 1b) holds, 
then v € T and so S = (v+ S)—v CT and so Ic) holds. If 1c) holds, then 
y-x+X=y+S Cy+T=Y and so la) holds. Part 2) is proved in a 
similar manner. 


For part 3), SCT implies that v+ X CY for some v€V and so if 
zEXAY then v+zeY and so veY, which implies that X CY. 
Conversely, if X C Y then part 1) implies that S C T. Part 4) follows similarly. 
We leave proof of 5) and 6) to the reader. O 


Affine Combinations 
Let X be a nonempty subset of V. It is well known that 


1) X is a subspace of V if and only if X is closed under linear combinations, 
or equivalently, X is closed under linear combinations of any two vectors 
in X. 

2) The smallest subspace of V containing X is the set of all linear 
combinations of elements of X. In different language, the linear hull of X 
is equal to the /inear span of X. 


We wish to establish the corresponding properties of affine subspaces of V, 
beginning with the counterpart of a linear combination. 


Definition Let V be a vector space and let x; € V. A linear combination 
Tey peee Tnn 


where r; € F and <r; = 1 is called an affine combination of the vectors x;.0 


Let us refer to a nonempty subset X of V as affine closed if X is closed under 
any affine combination of vectors in X and two-affine closed if X is closed 


Affine Geometry 429 


under affine combinations of any two vectors in X. These are not standard 
terms. 


The line containing two distinct vectors x,y € V is the set 
ry ={re+(1—-rjylre F}=y+ (x-y) 


of all affine combinations of x and y. Thus, a subset X of V is two-affine closed 
if and only if X contains the line through any two of its points. 


Theorem 16.2 Let V be a vector space over a field F with char(F’) 4 2. Then a 
subset X of V is affine closed if and only if it is two-affine closed. 

Proof. The theorem is proved by induction on the number n of terms in an 
affine combination. The case n = 2 holds by assumption. Assume the result true 
for affine combinations with fewer than n > 3 terms and consider the affine 
combination 


Z = T1£1 te F TnEn 
where n > 3. There are two cases to consider. If either of rı and rz is not equal 


to 1, say rı Æ 1, write 


T2 Tn 
To ++ 
l-r 2 l-r 


zone t(-n)| Zn 


and if r4 = r = 1, then since char(F’) 4 2, we may write 


1 1 
2=2 E + 5 +7323 Hee + Tanin 


In either case, the inductive hypothesis applies to the expression inside the 
square brackets and then to z.O 


The requirement char( F) 4 2 is necessary, for if F = Z», then the subset 

X= {(0,0), (1,0), (0, 1)} 
of F? is two-affine closed but not affine closed. We can now characterize flats. 
Theorem 16.3 4 nonempty subset X of a vector space V is a flat if and only if 
X is affine closed. Moreover, if char(F) 4 2, then X is a flat if and only if X is 
two-affine closed. 


Proof. Let X =x+S be a flat and let z; = x + s; € X, where s; € S. If 
Xr; = 1, then 


X rizi =X ri(x + si) =2+) risi Ert+S 
i i i 


and so X is affine closed. Conversely, suppose that X is affine closed, let 
x € X and let S = X — z. If r; € F and s; € S then 


430 Advanced Linear Algebra 


1181 + r282 = 11 (a1 — £) + ro(£2 — £) = r1£1 + rex — (ra +171) a 


for x; € X. Since the sum of the coefficients of zı, x2 and x in the last 
expression is 0, it follows that 


1181 + T282 + £ = r1£1 + rev — (r2 +71 —lxc E X 


and so 715; +1289 € X — x = S. Thus, S is a subspace of V and X = x + S is 
a flat. The rest follows from Theorem 16.2.0 


Affine Hulls 
The following definition is the analog of the subspace spanned by a collection 


of vectors. 


Definition Let X be a nonempty set of vectors in V. 
1) The affine hull of X, denoted by affhull(X), is the smallest flat containing 
2) a affine span of X, denoted by affspan(X), is the set of all affine 
combinations of vectors in X.O 

Theorem 16.4 Let X be a nonempty subset of V. Then 

affhull(X) = affspan(X) = x + span(X — x) 
or equivalently, for a subspace S of V, 

x+ S$ =affspan(xX) <= S =span(X — x) 
Also, 

dim(affspan(X)) = dim(span(X — <x)) 

Proof. Theorem 16.3 implies that affspan(X) C affhull(X) and so it is 


sufficient to show that A = affspan(X) is a flat, or equivalently, that for any 
y € A, the set S = A — y isa subspace of V. To this end, let 


n 
y= X TO iTi 
i=1 


Then any two elements of S have the form yı — y and y2 — y, where 
m m 
y = 5 Tii£i and y= 5o 12 4X5 
i=l i=l 


are in A. But if s,t € F, then 


Affine Geometry 431 


z= s(yı — y) + t(y2 — y) 
n 
— (sri + trai _ (s + t)ro’) 
i=l 
a 


(sr + tro; — (s +t- L)ro,i) 2i —y 
i=l 


which is in A — y = S, since the last sum is an affine sum. Hence, S is a 
subspace of V. We leave the rest of the proof to the reader. O 


The Lattice of Flats 


The intersection of subspaces is a subspace, although it may be trivial. For flats, 
if the intersection is not empty, then it is also a flat. However, since the 
intersection of flats may be empty, the set A(V) does not form a lattice under 
intersection. However, we can easily fix this. 
Theorem 16.5 Let V be a vector space. The set 

Ao(V) = A(V) U 10} 
of all flats in V, together with the empty set, is a complete lattice in which meet 
is intersection. In particular: 


1) Ao(V) is closed under arbitrary intersection. In fact, if 
F = {a; + S; | i € K} has nonempty intersection, then 


()F =(\@it+5)=2+ 15; 


iEeK iek ick 


for some x € ()F. In other words, the base of the intersection is the 
intersection of the bases. 

2) The join VF of the family F = {x; + S; | i € K} is the intersection of all 
flats containing the members of F. Also, 


\/F = afthull(()F) 
3) IfX=a2+SandY =y+T are flats in V, then 
XVY=24+((cx-y)+S5+T) 
IfX AY #9, then 
XVY=24+(S+T) 
Proof. For part 1), if 


x €{ \(x + S;) 


iEeK 


then z; + S; = x + S; for alli € K and so 


432 Advanced Linear Algebra 


ieK ieK ieK 
We leave proof of part 2) to the reader. 


For part 3), since x, y € X V Y, it follows that 
XVY=2+U=y+U 


for some subspace U of V. Thus, x — y € U. Also, x + S C x + U implies that 
SCU and similarly T CU, whence S+TCU and so if W = 
(x-y) +S +T, then W CU. Hence, c+WCa+U=X VY. On the 
other hand, 


X=2r£+S8C2rx2+w 
and 
Y=y+T=2-(c4-y)+TCr+w 
andso X VY Ca+W. Thus, X VY =2+W. 


If X NY # , then we may take the flat representatives for X and Y to be any 
element z € X N Y, in which case part 1) gives 


XVY=z4+((z-2)+54+T)=2+S4+T 


and since x € X V Y, we also have X VY =z+ S+T.O 
We can now describe the dimension of the join of two flats. 


Theorem 16.6 Let X = x + S and Y = y +T be flats in V. 
1D IfXAY £9, then 
dim(X V Y) = dim(S + T) = dim(X) + dim(Y) — dim(X N Y) 

2) XAY =f, then 

dim(X V Y) = dim(S +T) +1 
Proof. We have seen that if X N Y # 0, then 

XVY=r+85+T 

and so 

dim(X v Y) = dim(S + T) 
On the other hand, if X N Y = @, then 

XVY =gz+((z-y)+8+T) 


and since dim((x — y)) = 1, we get 


Affine Geometry 433 


dim(X V Y) = dim(S$ + T) +1 
Finally, we have 
dim(.S + T) = dim( S) + dim(T) — dim(S N T) 
and Theorem 16.5 implies that 
dim(X NY) = dim(S N T) oO 


Affine Independence 
We now discuss the affine counterpart of linear independence. 
Theorem 16.7 Let X be a nonempty set of vectors in V. The following are 


equivalent: 
1) Forallx € X, the set 


(X — x) \ {0} 


is linearly independent. 
2) Forallz € X, 


x ¢ affhull(X \ {}) 
3) For any vectors x; € X, 


X rai=0,$ na > ri=0foralli 


1 


4) For affine combinations of vectors in X, 


5 Titi = 5 Siti > Tri = si for alli 
i 


5) When X = {21,... , En} is finite, 
dim(affhull(X)) = n — 1 
A set X of vectors satisfying any (any hence all) of these conditions is said to be 


affinely independent. 
Proof. If 1) holds but there is an affine combination equal to x, 


n 


LS J Titi 


where x; Æ x for all i, then 


434 Advanced Linear Algebra 


Since r; is nonzero for some i, this contradicts 1). Hence, 1) implies 2). Suppose 
that 2) holds and 
oe Tifi = 0 


where `r; = 0. If some r;, say r1, is nonzero then 


zı = —)_ (ri/rı)x; € affhull(X \ {11} 


i>l 
which contradicts 2) and so r; = 0 for all 7. Hence, 2) implies 3). 


If 3) holds and the affine combinations satisfy 


then 


5 (ri = Si)£i =0 


and since X` (r; — s;) = 1 — 1 = 0, it follows that r; = s; for all 7. Hence, 4) 
holds. Thus, it is clear that 3) and 4) are equivalent. If 3) holds and 


YO nila: = x) = 0 


for x Æ x;, then 


Sone = (Scri)e = 0 
and so 3) implies that r; = 0 for all 7. 


Finally, suppose that X = {2 ,...,2,}. Since 
dim(affhull(X)) = dim( (X — 2;)) 


it follows that 5) holds if and only if (X — «;) \ {0}, which has size n — 1, is 
linearly independent. O 


Affinely independent sets enjoy some of the basic properties of linearly 
independent sets. For example, a nonempty subset of an affinely independent set 
is affinely independent. Also, any nonempty set X contains an affinely 
independent set. 


Since the affine hull H = affhull(X) of an affinely independent set X is not the 
affine hull of any proper subset of X, we deduce that X is a minimal affine 
spanning set of its affine hull. 


Affine Geometry 435 


Affine Bases and Barycentric Coordinates 
We have seen that a set X is affinely independent if and only if the set 
Xz = (X — x) \ {0} 
is linearly independent. We have also seen that for a subsapce S of V, 
x+ S = affspan(X) <= S =span(X,) 


Therefore, if by analogy, we define a subset 6 of a flat A = x + S to be an 
affine basis for A if B is affinely independent and affspan(B) = A, then B is an 
affine basis for x + S if and only if B, is a basis for S. 


Theorem 16.8 Let A = x + S be a flat of dimension n. Let B = (£1,..., £n) be 
an ordered basis for S and let (B + x) U {x£} = (£1 + £,..., £n + £, £) be an 
ordered affine basis for A. Then every v € A has a unique expression as an 
affine combination 


V = T1T1 + e +T nEn + Tnt 


The coefficients r; are called the barycentric coordinates of v with respect to 
the ordered affine basis B + x. 


For example, in R, a plane is a flat of the form A = x + (vı, v2) where 
B = (v1, v2) is an ordered basis for a two-dimensional subspace of RÌ. Then 


(B+a)U {a} = (vı + 2, v2 + x, £) = (pi, p2, p3) 
are barycentric coordinates for the plane, that is, any v € A has the form 
TP + T2p2 + T3P3 
where rı +r +r3 = 1. 
Affine Transformations 


Now let us discuss some properties of maps that preserve affine structure. 
Definition A function f: V — V that preserves affine combinations, that is, for 


which 
>n =l f (F ren) = X rf (ai) 


i i 


is called an affine transformation (or affine map, or affinity). O 


We should mention that some authors require that f be bijective in order to be 
an affine map. The following theorem is the analog of Theorem 16.2. 


436 Advanced Linear Algebra 


Theorem 16.9 If char(F) 42, then a function f:V >V is an affine 
transformation if and only if it preserves affine combinations of every pair of its 
points, that is, if and only if 


f(ra+ (1 —r)y) = rf(a) + (1-1) f(y) O 


Thus, if char( F) 4 2, then a map f is an affine transformation if and only if it 
sends the line through a and y to the line through f(x) and f(y). It is clear that 
linear transformations are affine transformations. So are the following maps. 


Definition Let v € V. The affine map T,: V — V defined by 
T(x) =a+v 


forall x € V, is called translation by v.O 


It is not hard to see that any composition T, o r, where r € L(V), is affine. 
Conversely, any affine map must have this form. 


Theorem 16.10 4 function f:V — V is an affine transformation if and only if 
it is a linear operator followed by a translation, 


f=Thor 


where v € V and T € L(V). 

Proof. We leave proof that T, o 7 is an affine transformation to the reader. Let f 
be an affine map and suppose that fO = —z. Then T, o f0 = 0. Moreover, 
letting 7 = T, o f, we have 


T(rut+ sv) = f(ru+ sv) +z 
= f(ru+sv+(l-r—s)0)+2z 
=rfu+sfu-—(l-r—-s)z+z 
= rTU + STU 


and so 7 is linear. O 


Corollary 16.11 

1) The composition of two affine transformations is an affine transformation. 

2) An affine transformation f = T, o T is bijective if and only if T is bijective. 

3) The set aff(V) of all bijective affine transformations on V is a group under 
composition of maps, called the affine group of V.O 


Let us make a few group-theoretic remarks about aff(V ). The set trans(V) of all 
translations of V is a subgroup of aff(V). We can define a function 
g:aff(V) > L(V) by 


o(T,°T) =T 


Affine Geometry 437 


It is not hard to see that ¢ is a well-defined group homomorphism from aff(V ) 
onto £(V), with kernel trans(V). Hence, trans(V) is a normal subgroup of 
aff(V) and 


aff(V) 
trans(V) © Av 


Projective Geometry 


If dim(V) = 2, the join (affine hull) of any two distinct points in V is a line. On 
the other hand, it is not the case that the intersection of any two lines is a point, 
since the lines may be parallel. Thus, there is a certain asymmetry between the 
concepts of points and lines in V. This asymmetry can be removed by 
constructing the projective plane. Our plan here is to very briefly describe one 
possible construction of projective geometries of all dimensions. 


By way of motivation, let us consider Figure 16.1. 


Figure 16.1 


Note that H is a hyperplane in a 3-dimensional vector space V and that 0 ¢ H. 
Now, the set A(H) of all flats of V that lie in H is an affine geometry of 
dimension 2. (According to our definition of affine geometry, H must be a 
vector space in order to define A(H). However, we hereby extend the definition 
of affine geometry to include the collection of all flats contained in a flat of V.) 


Figure 16.1 shows a one-dimensional flat X and its linear span (X), as well as a 
zero-dimensional flat Y and its span (Y). Note that, for any flat X in H, we 
have 


dim((X)) = dim(X) + 1 


Note also that if Lı and Lə are any two distinct lines in H, the corresponding 


438 Advanced Linear Algebra 


planes (Lı) and (L2) have the property that their intersection is a line through 
the origin, even if the lines are parallel. We are now ready to define projective 
geometries. 


Let V be a vector space of any dimension and let H be a hyperplane in V not 
containing the origin. To each flat X in H, we associate the subspace (X) of V 
generated by X. Thus, the linear span function P: A(H) — S(V) maps affine 
subspaces X of H to subspaces (X) of V. The span function is not surjective: 
Its image is the set of all subspaces that are not contained in the base subspace 
K of the flat H. 


The linear span function is one-to-one and its inverse is intersection with H, 
PŤU =UNAH 


for any subspace U not contained in K. 


The affine geometry A(H) is, as we have remarked, somewhat incomplete. In 
the case dim( H) = 2, every pair of points determines a line but not every pair 
of lines determines a point. 


Now, since the linear span function P is injective, we can identify A(H) with 
its image P(A(H)), which is the set of all subspaces of V not contained in the 
base subspace K. This view of A(H) allows us to “complete” A(H) by 
including the base subspace K. In the three-dimensional case of Figure 16.1, the 
base plane, in effect, adds a projective line at infinity. With this inclusion, every 
pair of lines intersects, parallel lines intersecting at a point on the line at infinity. 
This two-dimensional projective geometry is called the projective plane. 


Definition Let V be a vector space. The set S(V) of all subspaces of V is 
called the projective geometry of V. The projective dimension pdim(S) of 
S € S(V) is defined as 


pdim(S) = dim(S) — 1 


The projective dimension of P(V ) is defined to be pdim(V) = dim(V) — 1. A 
subspace of projective dimension 0 is called a projective point and a subspace 
of projective dimension 1 is called a projective line. O 


Thus, referring to Figure 16.1, a projective point is a line through the origin and, 
provided that it is not contained in the base plane K, it meets H in an affine 
point. Similarly, a projective line is a plane through the origin and, provided that 
it is not K, it will meet H in an affine line. In short, 


span(affine point) = line through the origin = projective point 
span(affine line) = plane through the origin = projective line 


The linear span function has the following properties. 


Affine Geometry 439 


Theorem 16.12 The linear span function P: A(H) — S(V) from the affine 
geometry A(H) to the projective geometry S(V) defined by PX = (X) 
satisfies the following properties: 

1) The linear span function is injective, with inverse given by 


PUl= 008 


for all subspaces U not contained in the base subspace K of H. 

2) The image of the span function is the set of all subspaces of V that are not 
contained in the base subspace K of H. 

3) X CY ifand only if (X) C (Y) 

4) If X; are flats in H with nonempty intersection, then 


span( f Xi) E N span(X;) 
ieK ieK 
5) For any collection of flats in H, 
span (v x) = Cp span(X;) 
icK ieK 
6) The linear span function preserves dimension, in the sense that 
pdim(span(X)) = dim(X) 


7) X || Y if and only if one of (X)QK and (Y)K is contained in the 
other. 

Proof. To prove part 1), let «+S be a flat in H. Then 2 € H and so 

H = x + K, which implies that S C K. Note also that (x + S) = (x) + S and 


z2€(@+S)NH=((2)+S8)N(@+K)>2=re+s=a2+k 


for some s € S, k€ K and re F. This implies that (1 — r)x € K, which 
implies that either x € K or r= 1. But x € H implies x ¢ K and so r= 1, 
which implies that z = x + s € x + S. In other words, 


(a+ S)NHCa+8 
Since the reverse inclusion is clear, we have 
(a+ S)\NH=a24+8 
This establishes 1). 
To prove 2), let U be a subspace of V that is not contained in K. We wish to 


show that U is in the image of the linear span function. Note first that since 
U ¢ K and dim(K) = dim(V) — 1, we have U + K = V and so 


dim(U N K) = dim(U) + dim(K) — dim(U + K) = dim(U) — 1 


440 


Advanced Linear Algebra 


Now let 0 A x € U — K. Then 


c¢K=>(e)+kK=V 
=>rx+keHforsome04reFk, kek 
>rreH 


Thus, ræ € U N H for some 0 Æ r € F. Hence, the flat rx + (U N K) lies in H 


and 


dim(ra + (U N K)) = dim(U N K) = dim(U ) — 1 


which implies that span(ra + (U N K)) = (ra) + (U N K) lies in U and has 
the same dimension as U. In other words, 


span(ra + (U N K)) = (rz) + (UQ K)= U 


We leave proof of the remaining parts of the theorem as exercises. O 


Exercises 


1, 


2. 


10. 


11. 


12. 


Show that if z1,...,£n E V, then the set S = {Hrjax; | Ur; = 0} is a 
subspace of V. 
Prove that if X C V is nonempty then 


affhull(X) = z + (X — 2) 


Prove that the set X = {(0,0), (1,0), (0,1)} in (Z2)? is closed under the 

formation of lines, but not affine hulls. 

Prove that a flat contains the origin 0 if and only if it is a subspace. 

Prove that a flat X is a subspace if and only if for some x € X we have 

rz € X for some 1 Are F. 

Show that the join of a collection C = {x; + S; | i € K} of flats in V is the 

intersection of all flats that contain all flats in C. 

Is the collection of all flats in V a lattice under set inclusion? If not, how 

can you “fix” this? 

Suppose that X = x + S and Y = y + T. Prove that if dim(X) = dim(Y) 

and X || Y, then S = T. 

Suppose that X = x + S and Y = y +T are disjoint hyperplanes in V. 

Show that S = T. 

(The parallel postulate) Let X be a flat in V and v ¢ X. Show that there is 

exactly one flat containing v, parallel to X and having the same dimension 

as X. 

a) Find an example to show that the join X V Y of two flats may not be 
the set of all lines connecting all points in the union of these flats. 

b) Show that if X and Y are flats with X OY Æ Ø, then X VY is the 
union of all lines zy where x € X and y E€ Y. 

Show that if X || Y and X NY = @, then 


dim(X V Y) = max{dim(X), dim(Y)} + 1 


13. 


14. 


15. 


16. 


Affine Geometry 441 


Let dim(V) = 2. Prove the following: 


a) 
b) 


The join of any two distinct points is a line. 
The intersection of any two nonparallel lines is a point. 


Let dim(V) = 3. Prove the following: 


a) 
b) 


The join of any two distinct points is a line. 

The intersection of any two nonparallel planes is a line. 

The join of any two lines whose intersection is a point is a plane. 
The intersection of two coplanar nonparallel lines is a point. 

The join of any two distinct parallel lines is a plane. 

The join of a line and a point not on that line is a plane. 

The intersection of a plane and a line not on that plane is a point. 


Prove that f:V — V is a surjective affine transformation if and only if 
f = ToTu for some w € V and 7 € L(V). 

Verify the group-theoretic remarks about the group homomorphism 
g:aff(V) — L(V) and the subgroup trans(V) of aff(V ). 


Chapter 17 
Singular Values and the Moore—Penrose 
Inverse 


Singular Values 


Let U and V be finite-dimensional inner product spaces over C or R and let 
T € L(U,V). The spectral theorem applied to 7*7 can be of considerable help 
in understanding the relationship between 7 and its adjoint 7*. This relationship 
is shown in Figure 17.1. Note that U and V can be decomposed into direct sums 


U=AO@OB and V=CE@D 


in such a manner that 7: A — C and 7*: C — A act symmetrically in the sense 
that 


T: Ui > Sii and 7°: Ui => Sii 


Also, both 7 and 7* are zero on B and D, respectively. 


We begin by noting that r*r € £(U) is a positive Hermitian operator. Hence, if 
r = rk(T) = rk(T*7T), then U has an ordered orthonormal basis 


B= (Gizcee Ur, Urge iiy Un) 


of eigenvectors for 7*7, where the corresponding eigenvalues can be arranged 
so that 
At > sae > Ar = 0 = Apt =... — An 


The set (ur+1,..., Un) is an ordered orthonormal basis for ker(7*r) = ker(rT) 
and so (ui, ... , ur) is an ordered orthonormal basis for ker(r)+ = im(7*). 


444 Advanced Linear Algebra 


im(t’) im(t) 
; 
T(U,)=S,V, 
: > : 
P ; : 
ONB of u T*(V,)=S,U v ONB of 
; r k? Tk k r : 
eigenvectors > eigenvectors 
for t*t for tt* 
Ura t=0 View 
e. |—————————_ | . 
‘ < : 
v*=0 
X Un Vm 
ker(t) ker(t’) 
Figure 17.1 
For i = 1,...,r, the positive numbers s; = y/ À; are called the singular values 


of 7. If we set s; = 0 for i > r, then 
* 2 
T TU, = Si Ui 


for i = 1,...,n. We can achieve some “symmetry” here between 7 and 7* by 
setting v; = (1/s;)ru; for each i < r, giving 


SiUi i<r 
TU = 


0 >r 
and 
A Siu; Ir 
Tu= l 
’ 0 i>r 
The vectors v1, ... , U, are orthonormal, since if 7,7 < r, then 
= = * E =6 
(vi, Uj) = — (Tuy, Tuj) = (T TU, Uj) = — (Ui, Uj) = Gij 
Sj Sj Sj Sj Sj 
Hence, (v1, ..., U+) is an orthonormal basis for im(T) = ker(T*)+, which can be 
extended to an orthonormal basis C = (v1,...,Um) for V, the extension 
(Ur+1; ---, Um) being an orthonormal basis for ker(7*). Moreover, since 
TT*Ui = 8;TU; = 870; 
the vectors v1,..., Up are eigenvectors for rr* with the same eigenvalues 


Ai = s? as for r*r. This completes the picture in Figure 17.1. 


Singular Values and the Moore-Penrose Inverse 445 


Theorem 17.1 Let U and V be finite-dimensional inner product spaces over C 
or R and let rT € L(U,V) have rank r. Then there are ordered orthonormal 
bases B and C for U and V, respectively, for which 


B = (w1,..., Ur, Urti- -Un ) 
ONB for im(r*) ONB for ker(r) 
and 


C=(U1,..-,Up, Udi, ---,Um ) 
tS a 


ONB for im(r) ONB for ker(r*) 
Moreover, fori <k <r, 
TU; = SiV; 
TU; = Siu 
where s; > 0 are called the singular values of 7, defined by 
T* TU; = sui, Si >0 
fori < r. The vectors u,...,u,; are called the right singular vectors for T and 


the vectors vı, ... , vr are called the left singular vectors for 7.O 


The matrix version of the previous discussion leads to the well-known singular- 
value decomposition of a matrix. Let A E Mm n(F) and let B = (u1, ..., Un) 
and C = (v1, ..., Um) be the orthonormal bases from U and V, respectively, in 
Theorem 17.1, for the operator 74. Then 


[Tlec = X = diag(s1, s2,...,5,,0,...,0) 
A change of orthonormal bases from the standard bases to C and D gives 
A= [ralé.én = Meen [ta]a,cMe,,2 = PZQ" 
where P = Mce, and Q = Mge, are unitary/orthogonal. This is the singular- 
value decomposition of A. 
As to uniqueness, if A = PXQ*, where P and Q are unitary and ¥ is diagonal, 
with diagonal entries \;, then 
A*A = (PXQ*)* PEQ* = QU* XQ" 
and since U*S = diag(Aj,...,A?), it follows that the \?'s are eigenvalues of 
A*A, that is, they are the squares of the singular values along with a sufficient 


number of 0's. Hence, X is uniquely determined by A, up to the order of the 
diagonal elements. 


446 Advanced Linear Algebra 


We state without proof the following uniqueness facts and refer the reader to 
[48] for details. If n < m and if the eigenvalues A; are distinct, then P is 
uniquely determined up to multiplication on the right by a diagonal matrix of the 
form D = diag(z1,...,2m) with |z;| = 1. If n < m, then Q is never uniquely 
determined. If m = n = r, then for any given P there is a unique Q. Thus, we 
see that, in general, the singular-value decomposition is not unique. 


The Moore—Penrose Generalized Inverse 


Singular values lead to a generalization of the inverse of an operator that applies 
to all linear transformations. The setup is the same as in Figure 17.1. Referring 
to that figure, we are prompted to define a linear transformation T+: V — U by 


EE tu; fori<r 
T'U; = § Fi , 
0 fori >r 
since then 


rs) | (ui, : Ur) = 


(FT) | bisa, “atl ~ 0 


and 


CPF teas = 
(rr) =0 


(Urr: Um) 


Hence, if n=m =r, then r+ = r~t. The transformation 7+ is called the 
Moore-Penrose generalized inverse or Moore-Penrose pseudoinverse of T. 
We abbreviate this as MP inverse. 


Note that the composition 7*7T is the identity on the largest possible subspace of 
U on which any composition of the form ør could be the identity, namely, the 
orthogonal complement of the kernel of 7. A similar statement holds for the 
composition rr”. Hence, 7* is as “close” to an inverse for 7 as is possible. 


We have said that if 7 is invertible, then r+ = r~t. More is true: If 7 is 
injective, then t*t =z and so T* is a left inverse for T. Also, if T is surjective, 
then 7* is a right inverse for r. Hence the MP inverse 7+ generalizes the one- 
sided inverses as well. 


Here is a characterization of the MP inverse. 


Theorem 17.2 Let r€ L(U,V). The MP inverse t* of T is completely 
characterized by the following four properties: 

1) TTTS 
2) trr =r 
3) tT is Hermitian 
4) 7°? is Hermitian 


+ 


Singular Values and the Moore-Penrose Inverse 447 


Proof. We leave it to the reader to show that +* does indeed satisfy conditions 
1)-4) and prove only the uniqueness. Suppose that p and ø satisfy 1)—4) when 
substituted for 7+. Then 


= (toT)"p"p 
* OK OK k 


=T oT pp 
= (or)*r"p"p 
=oTT pp 

= OTPTP 
=oTp 


and 


O =o0TO 
= o0(tTa)* 
=o00°T* 
= 00*(TpT)* 
= ao*T* p*r* 
= 00°T (Tp)* 
=G T Tp 
= OTOTP 
=oTp 


which shows that p = o.O 


The MP inverse can also be defined for matrices. In particular, if A € M),,,(F), 
then the matrix operator 74 has an MP inverse rj. Since this is a linear 
transformation from F” to F™, it is just multiplication by a matrix T} = Tp. 
This matrix B is the MP inverse for A and is denoted by A’. 


Since el =T,4+ and TAB = TATpg, the matrix version of Theorem 17.2 implies 
that AT is completely characterized by the four conditions 


1) AATA=A 

2) AtAAt = At 

3) AA? is Hermitian 
4) A*A is Hermitian 


Moreover, if 
A = UEU} 


is the singular-value decomposition of A, then 


448 Advanced Linear Algebra 


At = UYU} 


where X’ is obtained from © by replacing all nonzero entries by their 
multiplicative inverses. This follows from the characterization above and also 
from the fact that for i < r, 


UX'Uřvi = U2» e; = 8; Use; = sj tui 
and for i >r, 


UX'UŽ vi = Unde; = 0 


Least Squares Approximation 


Let us now discuss the most important use of the MP inverse. Consider the 
system of linear equations 


Ar =v 


where A E€ Mm,n(F). (As usual, F = C or F = R.) This system has a solution 
if and only if v € im(7,). If the system has no solution, then it is of considerable 
practical importance to be able to solve the system 


Az =O 


where ® is the unique vector in im(7,) that is closest to v, as measured by the 
unitary (or Euclidean) distance. This problem is called the linear least squares 
problem. Any solution to the system Ax = 7% is called a least squares solution 
to the system Ax = v. Put another way, a least squares solution to Ax = v is a 
vector x for which || Ax — v|| is minimized. 


Suppose that w and z are least squares solutions to Ax = v. Then 
Aw=t= Az 


and so w — z € ker(A). (We will write A for 74.) Thus, if w is a particular least 
squares solution, then the set of all least squares solutions is w + ker(A). 
Among all solutions, the most interesting is the solution of minimum norm. 
Note that if there is a least squares solution w that lies in ker(A)*, then for any 
z € ker(A), we have 


2 2 2 2 
lw + zl? = lwl + [lal > lwll 
and so w will be the unique least squares solution of minimum norm. 


Before proceeding, we recall (Theorem 9.14) that if S is a subspace of a finite- 
dimensional inner product space V, then the best approximation to a vector 
v € V from within S is the unique vector ù € S for which v —v L S. Now we 
can see how the MP inverse comes into play. 


Singular Values and the Moore-Penrose Inverse 449 


Theorem 17.3 Let A E€ Mm n(F). Among the least squares solutions to the 
system 


Az =O 


there is a unique solution of minimum norm, given by A*v, where A* is the MP 
inverse of A. 

Proof. A vector w is a least squares solution if and only if Aw = %. Using the 
characterization of the best approximation %, we see that w is a solution to 
Aw = dif and only if 


Aw —v L im(A) 
Since im(A)+ = ker(A*) this is equivalent to 
A*(Aw — v) =0 
or 
A* Aw = A*v 


This system of equations is called the normal equations for Az = v. Its 
solutions are precisely the least squares solutions to the system Ax = v. 


To see that w = Atv is a least squares solution, recall that, in the notation of 
Figure 17.1, 


AAty, = vY <r 
i 0 i>r 
and so 
A*A(Atvi) = a © edi 
0 ~>r 
and since C = (v1, ... , Um) is a basis for V, we conclude that Atv satisfies the 


normal equations. Finally, since A*v € ker(A)+, we deduce by the preceding 
remarks that Atv is the unique least squares solution of minimum norm. O 


Exercises 


1. Let 7 € £(U). Show that the singular values of 7* are the same as those of 


Ti 
2. Find the singular values and the singular value decomposition of the matrix 
3 1 
a= [6 2 
Find At. 


3. Find the singular values and the singular value decomposition of the matrix 


450 Advanced Linear Algebra 


12 0 
asa 2 


Find A*. Hint: Is it better to work with A*A or AA*? 

4. Let X = (zı £2- Tn) be a column matrix over C. Find a singular-value 
decomposition of X. 

5. Let A € Mm n(F) and let B E€ Mm+n,m+n( F) be the square matrix 


P=| ao] 
0 block 


Show that, counting multiplicity, the nonzero eigenvalues of B are 
precisely the singular values of A together with their negatives. Hint: Let 
A = UXU% be a singular-value decomposition of A and try factoring B 
into a product U SU* where U is unitary. Do not read the following second 
hint unless you get stuck. Second Hint: Verify the block factorization 


e=[o, olls ollo l 
Uz OJI OJ|Uř 0 

What are the eigenvalues of the middle factor on the right? (Try e1 + €n41 

and €i — En+1-) 

6. Use the results of the previous exercise to show that a matrix 

A E€ Mmn(F), its adjoint A*, its transpose At and its conjugate A all have 

the same singular values. Show also that if U and U’ are unitary, then A 

and U AU” have the same singular values. 

7. Let A€ M,(F) be nonsingular. Show that the following procedure 

produces a singular-value decomposition A = UXU% of A. 

a) Write A=UDU* where D = diag(àı,..., An) and the A;'s are 
positive and the columns of U form an orthonormal basis of 
eigenvectors for A. (We never said that this was a practical procedure.) 

b) Let b= diag(A;! seer ; AY h) where the square roots are nonnegative. 
Also let U; = U and U; = A*U XH, 

8. If A = (a;,;) is ann x m matrix, then the Frobenius norm of A is 


1/2 
|Allp = (= «) 
ij 


Show that IAJ? = Ý. s? is the sum of the squares of the singular values of 


Chapter 18 
An Introduction to Algebras 


Motivation 


We have spent considerable time studying the structure of a linear operator 
T € L(V) on a finite-dimensional vector space V over a field F. In our 
studies, we defined the F'[x]-module V, and used the decomposition theorems 
for modules over a principal ideal domain to dissect this module. We 
concentrated on an individual operator 7, rather than the entire vector space 
Lr(V). In fact, we have made relatively little use of the fact that Lp (V) is an 
algebra under composition. In this chapter, we give a brief introduction to the 
theory of algebras, of which Lp (V) is the most general, in the sense of Theorem 
18.2 below. 


Associative Algebras 


An algebra is a combination of a ring and a vector space, with an axiom that 
links the ring product with scalar multiplication. 


Definition An (associative) algebra A over a field F, or an F-algebra, is a 
nonempty set A, together with three operations, called addition (denoted by 
+), multiplication (denoted by juxtaposition) and scalar multiplication (also 
denoted by juxtaposition), for which the following properties hold: 

1) Aisa vector space over F under addition and scalar multiplication. 

2) Ais aring with identity under addition and multiplication. 

3) Ifr € F anda,b € A, then 


r(ab) = (ra)b = a(rb) 
An algebra is finite-dimensional if it is finite-dimensional as a vector space. An 


algebra is commutative if A is a commutative ring. An element a € A is 
invertible if there is b € A for which ab = ba = 1.0 


Our definition requires that A have a multiplicative identity. Such algebras are 
called unital algebras. Algebras without unit are also of great importance, but 


452 Advanced Linear Algebra 


we will not study them here. Also, in this chapter, we will assume that all 
algebras are associative. Nonassociative algebras, such as Lie algebras and 
Jordan algebras, are important as well. 


The Center of an Algebra 
Definition The center of an F-algebra A is the set 
Z(A) = {a € A | az = xa for all x € A} 


of all elements of A that commute with every element of A.O 


The center of an algebra is never trivial since it contains a copy of F: 
{rl|reF}C ZA) 
Definition An F-algebra A is central if its center is as small as possible, that 
is, if 
Z(A) ={rl|reF} E| 


From a Vector Space to an Algebra 


If V is a vector space over a field F and if B = {b; | i € I} is a basis for V, 
then it is natural to wonder whether we can form an F-algebra simply by 
defining a product for the basis elements and then using the distributive laws to 
extend the product to V. In particular, we choose a set of constants ai with the 
property that for each pair (i, 7), only finitely many of the ai! are nonzero. Then 
we set 


bibj = X a br 
k 


and make multiplication bilinear, that is, 


and 
(3 ro) = Seri 
i=l i=l 


for r € F. It is easy to see that this does define a nonunital associative algebra 
A provided that 


An Introduction to Algebras 453 


(bib; ) by = bi(bjbk) 
for all 7, j,k € I and that A is commutative if and only if 
bib; = bjb; 
for all i,j € J. The constants ai? are called the structure constants for the 
algebra A. To get a unital algebra, we can take for a given i € J, the structure 
constants to be 
af! = oy = af 


in which case b; is the multiplicative identity. (An alternative is to adjoin a new 
element to the basis and define its structure constants in this way.) 


Examples 


The following examples will make it clear why algebras are important. 


Example 18.1 If F < E are fields, then E is a vector space over F. This vector 
space structure, along with the ring structure of Æ, is an algebra over F.O 


Example 18.2 The ring F'[x] of polynomials is an algebra over F'.O 


Example 18.3 The ring M„(F) of all n x n matrices over a field F is an 
algebra over F, where scalar multiplication is defined by 


M = (aij), ré€F = rM = (raij) z 


Example 18.4 The set Lp (V) of all linear operators on a vector space V over a 
field F is an F-algebra, where addition is addition of functions, multiplication is 
composition of functions and scalar multiplication is given by 


(ro)(v) = rjo] 


The identity map « E€ Lp (V) is the multiplicative identity and the zero map 
0 € Lp(V) is the additive identity. This algebra is also denoted by Endr (V), 
since the linear operators on V are also called endomorphisms of V.O 


Example 18.5 If G is a group and F is a field, then we can form a vector space 

F|G] over F by taking all formal F-linear combinations of elements of G and 

treating G as a basis for F'[G]. This vector space can be made into an F’-algebra 

where the structure constants are determined by the group product, that is, if 

gigj = Ju, then a” = 6;. The group identity gı = 1 is the algebra identity 
— Li ad jl 

since 91g; = g; and so a,” = 6;,; and similarly, ay’ = 6, ;. 


The resulting associative algebra F'[G] is called the group algebra over F. 
Specifically, the elements of F'[G] have the form 


454 Advanced Linear Algebra 


L = r191 +--+ +7Tngn 
where r; € F and g; € G. If 
y= sıhı onsa Smam 


then we can include additional terms with O coefficients and reindex if 
necessary so that we may assume that m = n and g; = h; for all i. Then the sum 
in F[G] is given by 


(>: nai) + (>: sa) E X: (ri + si)gi 
i=1 i=l i=l 


Also, the product is given by 


ot 


and the scalar product is 


sh) = X risigihy 
1 ij 


( : eat) = Yona O 


i= 


The Usual Suspects 


Algebras have substructures and structure-preserving maps, as do groups, rings 
and other algebraic structures. 


Subalgebras 


Definition Let A be an F-algebra. A subalgebra of A is a subset B of A that is 
a subring of A (with the same identity as A) and a subspace of A.O 


The intersection of subalgebras is a subalgebra and so the family of all 
subalgebras of A is a complete lattice, where meet is intersection and the join of 
a family F of subalgebras is the intersection of all subalgebras of A that contain 
the members of F. 


The subalgebra generated by a nonempty subset X of an algebra A is the 
smallest subalgebra of A that contains X and is easily seen to be the set of all 
linear combinations of finite products of elements of X, that is, the subspace 
spanned by the products of finite subsets of elements of X: 


(X)alg = (£1 En | TE X) 
Alternatively, (X)ag is the set of all polynomials in the variables in X. In 


particular, the algebra generated by a single element x € A is the set of all 
polynomials in x over F. 


An Introduction to Algebras 455 


Ideals and Quotients 


In defining the notion of an ideal of an algebra A, we must consider the fact that 
A may be noncommutative. 


Definition A (two-sided) ideal of an associative algebra A is a nonempty 
subset I of A that is closed under addition and subtraction, that is, 


abel => a+b,a-bel 


and also left and right multiplication by elements of A, that is, 
kel, ab€E A => akbel O 


The ideal generated by a nonempty subset X of A is the smallest ideal 
containing X and is equal to 


(X }idea = 1S ag ;b; | z; E X, ai, bi € a} 


i=l 


Definition An algebra A is simple if 

1) The product in A is not trivial, that is, ab 40 for at least one pair of 
elements a,b € A 

2) A has no proper nonzero ideals. O 


Definition /f I is an ideal in A, then the quotient algebra is the quotient 
ring/quotient space 


A/I={a+I|aeA} 
with operations 


(a+I)+(b+I)=(a+b)+I 
(a+1I)(b+I)=ab+I 
r(at+I)=ra+I 


where a,b € Aand r € F. These operations make A/I an F-algebra.O 


Homomorphisms 


Definition Jf A and B are F-algebras a map o:A— B is an algebra 
homomorphism if it is a ring homomorphism as well as a linear 
transformation, that is, 


a(a+a')=oa+aa'’, olad) = (ca)(oa, ol=1 
and 
r(oa) = o(ra) 


Jorre F.O 


456 Advanced Linear Algebra 


The usual terms monomorphism, epimorphism, isomorphism, embedding, 
endomorphism and automorphism apply to algebras with the analogous meaning 
as for vector spaces and modules. 


Example 18.6 Let V be an n-dimensional vector space over F. Fix an ordered 
basis B for V. Consider the map u: L(V) — M,,(F’) defined by 


n(o) = [ols 


where [co], is the matrix representation of ø with respect to the ordered basis B. 
This map is a vector space isomorphism and since 


[ro]s = I7|slols 
it is also an algebra isomorphism. O 
Another View of Algebras 


If A is an algebra over F, then A contains a copy of F. Specifically, we define a 
function A: F — A by 


Ar=rl 


for all r € F, where 1 € A is the multiplicative identity. The elements r1 are in 
the center of A, since for any a € A, 


(rl)a = r(1a) = ra 
and 
a(rl1) = r(al) =ra 
Thus, A: F — Z(A). To see that A is a ring homomorphism, we have 


A(lr)=1p:1=1 
Alr +s) = (r+ s)l =r1 + s1 = A(r)+ A(s) 
A(rs) = (rs)1 = r(s1) =rA(s) = r(1 - A(s)) = (r - 1)A(s) = A(r)A(s) 


Moreover, if r1 = 0 and r Æ 0, then 

0=r(r1)=1p-1=1 
and so provided that 0 Æ 1 in A, we have r = 0. Thus, A is an embedding. 
Theorem 18.1 


1) If A is an associative algebra over F and if 0 #1 in A, then the map 
à: F — Z(A) defined by 


Ar=rl 


is an embedding of the field F into the center Z(A) of the ring A. Thus, F 
can be embedded as a subring of Z(A). 


An Introduction to Algebras 457 


2) Conversely, if R is a ring with identity and if F C Z(R) is a field, then R 
is an F-algebra with scalar multiplication defined by the product in R.O 


One interesting consequence of this theorem is that a ring R whose center does 
not contain a field is not an algebra over any field F. This happens, for example, 
with the ring Ze. 


The Regular Representation of an Algebra 


An algebra homomorphism ø: A — L(V) is called a representation of the 
algebra A in £p(V). A representation ø is faithful if it is injective, that is, if o 
is an embedding. In this case, A is isomorphic to a subalgebra of Lp (V). 


Actually, the endomorphism algebras L(V) are the most general algebras 
possible, in the sense that any algebra A has a faithful representation in some 
endomorphism algebra. 


Theorem 18.2 Any associative F'-algebra A is isomorphic to a subalgebra of 
the endomorphism algebra L(A). In fact, if Ha is the left multiplication map 
defined by 


[at = ax 


then the map p: A — Lp(A) is an algebra embedding, called the left regular 
representation of A.O 


When dim(A) = n < oo, we can select an ordered basis 6 for A and represent 
the elements of £;(A) by matrices. This gives an embedding of A into the 
matrix algebra M„(F), called the left regular matrix representation of A 
with respect to the ordered basis B. 


Example 18.7 Let G = {1,a,...,a"-'} be a finite cyclic group. Let 
B=(1;@ ae) 


be an ordered basis for the group algebra F'[G]. The multiplication map ju; that 
is multiplcation by a” is a shifting of B (with wraparound) and so the matrix 
representation of ju; is the matrix whose columns are obtained from the identity 
matrix by shifting k columns to the right (with wrap around). For example, 


00- 0 1 
10- 0 0 
fiujp=}]O 1 `, 0 0 
Co di 


These matrices are called circulant matrices. O 


458 Advanced Linear Algebra 


Since the endomorphism algebras L(V) are of obvious importance, let us 
examine them a bit more closely. 


Theorem 18.3 Let V be a vector space over a field F. 
1) The algebra Lr(V) has center 
Z=({rl|reF} 

and so L(V) is central. 
2) The set I of all elements of L(V) that have finite rank is an ideal of 

L(V) and is contained in all other ideals of Lp (V). 
3) L(V) is simple if and only if V is finite-dimensional. 
Proof. We leave the proof of parts 1) and 3) as exercises. For part 2), we leave 
it to the reader to show that J is an ideal of LF (V). Let J be a nonzero ideal of 
Lr(V). Let f € L£r(V) have rank 1. Then there is a basis B = B, UB, (a 
disjoint union) and a nonzero w € V for which B, is a finite set, f (B2) = {0} 


and f(b) =ryw for all b € B. Thus, f is a linear combination over F of 
endomorphisms fp defined by 


fol) =w, — fo(B\ {b}) = {0} 


Hence, we need only show that f, E€ J. 


Ifo € J is nonzero, then there is an e € B for which ce = u 4 0. If r € Le(V) 
is defined by 


rb=e, T(B\{b})= {0} 
and A € Lp (V) is defined by 
àu=w, A(B\ {u}) = {0} 
then 
àor(b)=w, Aar(B\ {b}) = {0} 
and so f, = Aor € J.O 
Annihilators and Minimal Polynomials 


If A is an F-algebra and a € A, then it may happen that a satisfies a nonzero 
polynomial p(x) € Fa]. This always happens, in particular, if A is finite- 
dimensional, since in this case the powers 


1,a,a°,... 


must be linearly dependent and so there is a nonzero polynomial in a that is 
equal to 0. 


Definition Let A be an F-algebra. An element a € A is algebraic if there is a 
nonzero polynomial p(x) € F|x] for which p(a) = 0. If a is algebraic, the 


An Introduction to Algebras 459 


monic polynomial m (x) of smallest degree that is satisfied by a is called the 
minimal polynomial of a. O 
If a € A is algebraic over F’, then the subalgebra generated by a over F is 
Fla] = {p(a) | p(x) € Fla], deg(p) < deg(ma) t 
and this is isomorphic to the quotient algebra 
Fz] 
(ma(x)) 


where (ma(x)) is the ideal generated by the minimal polynomial of a. We leave 
the details of this as an exercise. 


Flal = 


The minimal polynomial can be used to tell when an element is invertible. 


Theorem 18.4 
I) The minimal polynomial m,(x) of a € A generates the annihilator of a, 
that is, the ideal 


ann(a) = {f (x) € Fla] | f(a) = 0} 


of all polynomials that annihilate a. 

2) The element a € A is invertible if and only if ma(x) has nonzero constant 
term. 

Proof. We prove only the second statement. If a is invertible but 


Ma(x) = xp(x) 


then 0 = m,(a) = ap(a). Multiplying by a~! gives p(a) = 0, which contradicts 
the minimality of deg(7m,(x)). Conversely, if 


Malz) = ao Faye +- + na") + gt 
where ay Æ 0, then 
0 = a0 + aya +e Ht Q,_1a" + 4 a” 


and so 


— (ai + aga +- + an1a”? +a" )a =1 
ag 


and so 


a! = — (ai +aza +- + QAn—10"7? + a) O 
ag 


460 Advanced Linear Algebra 


Theorem 18.5 /f A is a finite-dimensional F-algebra, then every element of A 
is algebraic. There are infinite-dimensional algebras in which all elements are 
algebraic. 

Proof. The first statement has been proved. To prove the second, let us consider 
the complex field C as a Q-algebra. The set A of algebraic elements of C is a 
field, known as the field of algebraic numbers. These are the complex numbers 
that are roots of some nonzero polynomial with rational (or integral) 
coefficients. 


To see this, if a € A, then the subalgebra Q[a] is finite-dimensional. Also, Q[a] 
is a field. To prove this, first note that since C is a field, the minimal polynomial 
of any nonzero a€A is irreducible, for if ma(x) = p(x)q(x), then 
0 = p(a)q(a) and so one of p(a) and g(a) is 0, which implies that 
p(x) = Malz) or g(a) = m,(x). Since m,(a) is irreducible, it has nonzero 
constant term and so the inverse of a is a polynomial in a, that is, a~! € Qfa]. 
Of course, Q{a] is closed under addition and multiplication and so Q{a] is a 
subfield of C. 


Thus, C is an algebra over Qļa]. By similar reasoning, if b € A, then the 
minimal polynomial of b over Qļ[a] is irreducible and so b~' € Q{a][b]. Since 
Q{a][] = Q{a, b] is the set of all polynomials in the “variables” a and b, it is 
closed under addition and multiplication as well. Hence, Q[a,b] is a finite- 
dimensional algebra over Q|a], as well as a subfield of C. Now, 


dimg (Q|a, b]) = dimgy,j(Q|a, b]) - dimog (Qla]) 


and so Q[a, b] is finite-dimensional over Q. Hence, the elements of Q{a, b] are 
algebraic over Q, that is, Q[a, b] C A. But Qfa, b] contains a~!,a + b, a — b and 
ab and so A is a field. 


We claim that A is not finite-dimensional over Q. This follows from the fact 
that for every prime p, the polynomial x” — p is irreducible over Q (by 
Eisenstein's criterion). Hence, if a is a complex root of x” — p, then a has 
minimal polynomial xz” — p over Q and so the dimension of Q{a] over Q is n. 
Hence, A cannot be finite-dimensional.O 


The Spectrum of an Element 


Let A be an algebra. A nonzero element a € A is a left zero divisor if ab = 0 
for some b £0 and a right zero divisor if ca = 0 for some c Æ 0. In the 
exercises, we ask the reader to show that an algebraic element is a left zero 
divisor if and only if it is a right zero divisor. 


Theorem 18.6 Let A be a algebra. An algebraic element a € A is invertible if 
and only if it is not a zero divisor. 

Proof. If a is invertible and ab = 0, then multiplying by a! gives b = 0. 
Conversely, suppose that a is not invertible but ab = 0 implies b = 0. Then 


An Introduction to Algebras 461 


M,(x) = xp(x) for some nonzero polynomial p(x) and so 0 = ap(a), which 
implies that p(a) = 0, a contradiction to the minimality of m,(«).0 


We have seen that the eigenvalues of a linear operator 7 on a finite-dimensional 
vector space are the roots of the minimal polynomial of 7, or equivalently, the 
scalars r for which 7 — rv is not invertible. By analogy, we can define the 
eigenvalues of an element a of an algebra A. 


Theorem 18.7 Let A be an algebra and let a € A be algebraic. An element 
r € F is a root of the minimal polynomial m,(x) if and only if a — r1 is not 
invertible in A. 

Proof. If a — r1 is not invertible, then 


Ma-ri (£) = xp(z) 
and since m,(a + r1) is satisfied by a — r1, it follows that 
xp(x) = Ma-ri (£) | Malx + 71) 


Hence, (x — r1) | m,(x). Alternatively, if a — r1 is not invertible, then there is 
a nonzero b € A such that (a — r1)b = 0, that is, ab = rb. Hence, for any 
polynomial p(x) we have p(a)b=p(r)b. Setting p(x) =m,(x) gives 
Malr) = 0. 


Conversely, if ma(r)=0, then ma(x)= (x-—rl)p(x) and so 
0 = (a—11)p(a), which shows that a — r1 is a zero divisor and therefore not 
invertible. O 


Definition Let A be an F-algebra and let a € A be algebraic. The roots of the 
minimal polynomial of a are called the eigenvalues of a. The set of all 
eigenvalues of a 


Spec(a) = {r € F | ma(r) = 0} 
is called the spectrum of a. O 


Note that a € A is invertible if and only if 0 ¢ Spec(a). 


Theorem 18.8 (The spectral mapping theorem) Let A be an algebra over an 
algebraically closed field F. Leta € A and let p(x) € Fa]. Then 
Spec(p(a)) = p(Spec(a)) = {p(r) | r € Spec(a)} 


Proof. We leave it as an exercise to show that p(Spec(a)) C Spec(p(a)). For 
the reverse inclusion, let r € Spec(p(a)) and suppose that 


(=r hone 


Then 


462 Advanced Linear Algebra 


pia) — rl = (a — r11)™---(a — rn1)” 
and since the left-hand side is not invertible, neither is one of the factors 
a — rkl, whence rg € Spec(a). But 
plr) —r =0 
and so r = p(r;,) € p(Spec(a)). Hence, Spec(p(a)) C p(Spec(a)).O 
Theorem 18.9 Let A be an algebra over an algebraically closed field F. If 
a,b € A, then 
Spec(ab) = Spec(ba) 
Proof. If 0 4 r ¢ Spec(ba), then ba — r1 is invertible and a simple computation 
gives 
(ab — r1)[a(ba — r1)™tb — 1] =r 
and so ab —rl is invertible and r ¢ Spec(ab). If 0 ¢ Spec(ba), then ba is 
invertible. We leave it as an exercise to show that this implies that ab is also 


invertible and so 0 ¢ Spec(ab). Thus, Spec(ab) C Spec(ba) and by symmetry, 
equality must hold.O 


Division Algebras 


Some important associative algebras A have the property that all nonzero 
elements are invertible and yet A is not a field since it is not commutative. 


Definition An associative algebra D over a field F is a division algebra if 
every nonzero element has a multiplicative inverse. O 


Our goal in this section is to classify all finite-dimensional division algebras 
over the real field IR, over any algebraically closed field F and over any finite 
field. The classification of finite-dimensional division algebras over the rational 
field Q is quite complicated and we will not treat it here. 


The Quaternions 


Perhaps the most famous noncommutative division algebra is the following. 
Define a real vector space H with basis 


B = {1,i, j,k} 
To make H into an F-algebra, define the product of basis vectors as follows: 
1) 1z = z1 =r forall z € B 
2) = P =k? = 1 


3) ij=k,jk=iki=j 
4) ji=-—k,kj= —i, ik = —j 


An Introduction to Algebras 463 


Note that 3) can be stated as follows: The product of two consecutive elements 
i,j,k is the next element (with wraparound). Also, 4) says that ya = —ay for 
x,y € {i,j,k}. This product is extended to all of H by distributivity. 


We leave it to the reader to verify that H is a division algebra, called Hamilton's 
quaternions, after their discoverer William Rowan Hamilton (1805-1865). 
(Readers familiar with group theory will recognize the quaternion group Q = 
{+1, +i, +j, +k}.) The quaternions have applications in geometry, computer 
science and physics. 


Finite-Dimensional Division Algebras over an Algebraically Closed 
Field 


It happens that there are no interesting finite-dimensional division algebras over 
an algebraically closed field. 


Theorem 18.10 Jf D is a finite-dimensional division algebra over an 
algebraically closed field F then D = F. 

Proof. Let a € D have minimal polynomial m,(«). Since a division algebra has 
no zero divisors, m,(a) must be irreducible over F and so must be linear. 
Hence, Mma(x) = x — r and so a = r € F.O 


Finite-Dimensional Division Algebras over a Finite Field 


The finite-dimensional division algebras over a finite field are also easily 
described: they are all commutative and so are finite fields. The proof, however, 
is a bit more challenging. To understand the proof, we need two facts: the class 
equation and some information about the complex roots of unity. So let us 
briefly describe what we need. 


The Class Equation 


Those who have studied group theory have no doubt encountered the famous 
class equation. Let G be a finite group. Each a € G can be thought of as a 
permutation o, of G defined by 


Oat = axa! 
for all x € G. The set of all conjugates axa~! of x is denoted by {x}© and so 
{x}° = {oax |a € G} 


This set is also called a conjugacy class in G. Now, the following are 
equivalent: 


464 Advanced Linear Algebra 


Gat = Oyu 
aza™! = bab! 

b`tarab™! = x 
bta € Co(z) 


where 
Calz) = {g € G | gz = xg} 


is the centralizer of x. But b-'a € Cg(a) if and only if a and b are in the same 
coset of Cc(a). Thus, there is a one-to-one correspondence between the 
conjugates of x and the cosets of Cg(x). Hence, 


[{x}°| = (G : Ce(2)) 


Since the distinct conjugacy classes form a partition of G (because conjugacy is 
an equivalence relation), we have 


IG] = S7l{2}9| = $ (G : Ce) 
res 


res 


where S is a set consisting of exactly one element from each conjugacy class 
{x}. Note that a conjugacy class {2}° has size 1 if and only if axa! = «x for 
all a € G, that is, ca = az for all a € G and these are precisely the elements in 
the center Z(G) of G. Hence, the previous equation can be written in the form 


IGI = |Z(@)| + $ (G : Ce(a)) 


xes! 


where S’ is a set consisting of exactly one element from each conjugacy class 
{x} of size greater than 1. This is the class equation for G. 


The Complex Roots of Unity 


If n is a positive integer, then the complex nth roots of unity are the complex 
solutions to the equation 


2” —-1=0 


The set U, of complex nth roots of unity is a cyclic group of order n. To see 
this, note first that U,, is an abelian group since a,b € U,, implies that ab € Un 
and a~! € Un. Also, since x” — 1 has no multiple roots, U„ has order n. 


Now, in any finite abelian group G, if m is the maximum order of all elements 
of G, then g” = 1 for all g € G. Thus, if no element of U,, has order n, then 
m <n and every g € G satisfies the equation x” — 1 = 0, which has fewer 
than n solutions. This contradiction implies that some element of U„ must have 
order n and so Up is cyclic. 


An Introduction to Algebras 465 


The elements of U,, that generate Un, that is, the elements of order n are called 
the primitive nth roots of unity. We denote the set of primitive nth roots of 
unity by Q. Hence, ifa € Qn, then 


= {ak | (n,k) = 1} 


has size (n), where ¢ is the Euler phi function. (The value 6(k) is defined to 
be the number of positive integers less than or equal to k and relatively prime to 
k.) 


The nth cyclotomic polynomial is defined by 
Qula) = T] @-w) 
wed, 
Thus, 
deg(Qn(x)) = o(n) 


Since every nth root of unity is a primitive dth root of unity for some d | n and 
since every primitive dth root of unity for d | n is also an nth root of unity, we 


deduce that 
Un = U Qa 
d\n 


where the union is a disjoint one. It follows that 


z -—1= [e 


d|n 


Finally, we show that Q,,(x) is monic and has integer coefficients by induction 
on n. It is clear from the definition that Qn (x) is monic. Since Qı (x) = x — 1, 
the result is true for n = 1. If p is a prime, then all nonidentity pth roots of unity 
are primitive and so 


Q(z) = 


xz? —1 


x-1 


=P] 4P 4.4041 


and the result holds for n = p. Assume the result holds for all proper divisors of 
n. Then 


r” -1 = Q(x) [Qa(x) = Qn(x)R(z) 


d\n 
d<n 


By the induction hypothesis, R(x) has integer coefficients and it follows that 
Q,(x) must also have integer coefficients. 


Wedderburn's Theorem 
Now we can prove Wedderburn's theorem. 


466 Advanced Linear Algebra 


Theorem 18.11 (Wedderburn's theorem) /f D is a finite division algebra, then 
Dis a field. 

Proof. We must show that D is commutative. Let G = D* be the multiplicative 
group of all nonzero elements of D. The class equation is 


|D*| = |Z(D*)| + $ (D* : (8) 


where the sum is taken over one representative @ from each conjugacy class of 
size greater than 1. If we assume for the purposes of contradiction that D is not 
commutative, that is, that Z (D*) # D*, then the sum on the far right is not an 
empty sum and so |C*(3)| < |D*| for some 6 € D*. 


The sets Z(D) and C((@) are subalgebras of D and, in fact, Z(D) is a 
commutative division algebra; that is, a field. Let |Z(D)| = z > 2. Since 
Z(D) C C(8), we may view C'(3) and D as vector spaces over Z(D) and so 


IC(3)| = 2 and |D| = 2” 


for integers 1 < k(8) < n. The class equation now gives 


z™—1 

2 -1l=2-14+)> =, 
k(B8) _ 

i Om} 


and since z*(9) — 1 | z” — 1, it follows that k(8) | n. 


If Qn(x) is the nth cyclotomic polynomial, then Q,,(z) divides z” — 1. But 
Qn(z) also divides each summand on the far right above, since its roots are not 
roots of z*(%) — 1. It follows that Qn (z) | z — 1. On the other hand, 


Qn(z) = lI (z -= w) 


wEQn, 


and since w € Q, implies that |z — w| > z — 1, we have a contradiction. Hence 
Z(D*) = D* and D is commutative, that is, D is a field. 
Finite-Dimensional Real Division Algebras 


We now consider the finite-dimensional division algebras over the real field R. 
In 1877, Frobenius proved that there are only three such division algebras. 


Theorem 18.12 (Frobenius, 1877) /f D is a finite-dimensional division algbera 
over R, then 


D=R, D=C or D=H 
Proof. Note first that the minimal polynomial ma(x) of any d € D is either 


linear, in which case d € R or irreducible quadratic ma(x) = x? + rz + s with 
r? — 4s < 0. Completing the square gives 


An Introduction to Algebras 467 


1\° 1 
O= mila) =a +ra+s= (a+r) poer 


Hence, any a € D has the form 


where t € R and either œ = 0 or a? < 0. Hence, a? € R but a ¢ R. Thus, every 
element of D is the sum of an element of R and an element of the set 


D' ={a€D| a’? <0} 
that is, as sets: 
D=R+D' 


Also, RN D’ = {0}. To see that D’ is a subspace of D, let u,v € D'. We wish 
to show that u+veD. If v=ru for some reR, then 
u+v=(1+r)u € D’. So assume that u and v are linearly independent. Then 
u and v are nonzero and so also nonreal. 


Now, u + v and u — v cannot both be real, since then u and v would be real. We 
have seen that 


Utv=r+a 


and 


u-v=s+ 
where r,s € R, at least one of a or 8 is nonzero and a”, 3? < 0. Then 
(u +v)’ + (u =v}? =(r+a)?+(s+ 6) 
and so 
Qu? +W? = r? + 2ra +a? +3 +288 +8? 
Collecting the real part on one side gives 
2ra + 283 = 2u? + W — (r? +07 + s* +p’) 


Now, if we knew that a, and 1 were linearly independent over R we could 
conclude that r = s = 0 and so 


(u+v)? =a? <0 and (u—v)?=6? <0 


which shows that u + v and u — v are in D’. 


To see that {a, 8,1} is linearly independent, it is equivalent to show that 
{u, v, 1} is linearly independent. But if 


468 Advanced Linear Algebra 


v=aut+b 
for a,b € R, then 
ve = aru? + 2abu + b? 


and since u ¢ R, it follows that ab = 0 and so a = 0 or b = 0. But a Æ 0 since 
v ¢ Rand b ¥ 0 since {u, v} are linearly independent. 


Thus, D’ is a subspace of D and 


D=RED' 
We now look at D’, which is a real vector space. If D' = {0}, then D = R and 
we are done, so assume otherwise. If a € D’ is nonzero, then a? = —r? where 


r € R. Hence, i = ar! € D' satisfies i? = —1. If 
D=Ri={ri|reR} 
then D = R @ Ri = C and we are done. If not, then Ri is a proper subspace of 
DK 
In the quaternion field, there is an element j for which 77 + ji = 0. So we seek a 
j € D' \ Ri with this property. To this end, define a bilinear form on D’ by 
(u,v) = —(uv + vu) 


Then it is easy to see that this form is a real inner product on D’ (positive 
definite, symmetric and bilinear). Hence, if Ri is a proper subspace of D’, then 


D=Rios 
where © denotes the orthogonal direct sum. If u € S is nonzero, then 
u? = —r? for r € Rand so if j = ur, then 
f=-1 and ji+ij=0 


Now, Rj is a subspace of S and so 
D'=RiORJOT 
Setting k = ij, we have 
—(i, k) = ik + ki = iij + iji = 0 
and 
- (j, k) = jk + kj = jij + ijj = 0 
and so k € T and we can write 


D'=RiORjJORKOU 


An Introduction to Algebras 469 


Now, if U Æ {0}, then there is a u € U for which u? = —1 and 


ui = ~iu 
uj = —ju 
uk = —ku 


The third equation is u(ij) = —(ij)u and so 


u(tj) = —(ij)u = iuj = —uij 


whence uij = 0, which is false. Hence, U = {0} and 


D=R@Ri 9 Rj E Rk = H 


This completes the proof.O 


Exercises 


I; 


a 


Prove that the subalgebra generated by a nonempty subset X of an algebra 
A is the subspace spanned by the products of finite subsets of elements of 
Xx: 


(X)alg = (£1 En | UE X) 


Verify that the group algebra F'[G] is indeed an associative algebra over F. 
Show that the kernel of an algebra homomorphism is an ideal. 

Let A be a finite-dimensional algebra over F and let B be a subalgebra. 
Show that ifb € B is invertible, then b~! € B. 

If A is an algebra and S C A is nonempty, define the centralizer C'4(S') of 
S to be the set of elements of A that commute with all elements of S. Prove 
that C'4(S') is a subalgebra of A. 

Show that Ze is not an algebra over any field. 

Let A = Fa] be the algebra generated over F by a single algebraic element 
a. Show that A is isomorphic to the quotient algebra F'[x]/(f(x)), where 
(f(x)) is the ideal generated by f(x) € F'[a]. What can you say about 
f(x)? What is the dimension of A? What happens if a is not algebraic? 

Let G = {1 = ao, ... , an } be a finite group. For x € F[G] of the form 


£ = T101 +++ + Tnan 


let T(x)=rı +: +r. Prove that T:F/G]— F is an algebra 
homomorphism, where F is an algebra over itself. 

Prove the first isomorphism theorem of algebras: A homomorphism 
o: A — B of F-algebras induces an isomorphism o: A/ker(o) ~ im(c) 
defined by F(aker(o)) = oa. 


10. Prove that the quaternion field is an F'-algebra and a field. Hint: For 


L=rotritratrsk £0 


470 


11. 


12. 


13. 


14. 
15. 
16. 


Advanced Linear Algebra 


(ro = ro1) consider 
T = To — rit — Taj — 13k 


Describe the left regular representation of the quaternions using the ordered 
basis B = (1, i, j, k). 

Let S, be the group of permutations (bijective functions) of the ordered set 
X = (#1,...,%,), under composition. Verify the following statements. 
Each ø € Sn defines a linear isomorphism 7, on the vector space V with 
basis X over a field F. This defines an algebra homomorphism 
f: F[S,] > L(V) with the property that f (o) = 7,. What does the matrix 
representation of ao € Spn look like? Is the representation f faithful? 

Show that the center of the algebra Lp (V) is 


Z={rl|reF} 


Show that Lp (V) is simple if and only if dim(V) < oo. 

Prove that for n > 3, the matrix algebras M,,(F’) are central and simple. 

An element a € A is left-invertible if there is a b € A for which ba = 1, in 

which case b is called a left inverse of a. Similarly, a € A is right- 

invertible if there is a b € A for which ab = 1, in which case b is called a 

right inverse of a. Left and right inverses are called one-sided inverses 

and an ordinary inverse is called a two-sided inverse. Let a € A be 

algebraic over F. 

a) Prove that ab = 0 for some b ¥ 0 if and only if ca = 0 for some c +Æ 0. 
Does c necessarily equal b? 

b) Prove that if a has a one-sided inverse b, then b is a two-sided inverse. 
Does this hold if a is not algebraic? Hint: Consider the algebra 
A = Lr(F[a)). 

c) Let a,b € A be algebraic. Show that ab is invertible if and only if a 
and b are invertible, in which case ba is also invertible. 


Chapter 19 
The Umbral Calculus 


In this chapter, we give a brief introduction to an area called the umbral 
calculus. This is a linear-algebraic theory used to study certain types of 
polynomial functions that play an important role in applied mathematics. We 
give only a brief introduction to the subject, emphasizing the algebraic aspects 
rather than the applications. For more on the umbral calculus, may we suggest 
The Umbral Calculus, by Roman [1984]? 


One bit of notation: The lower factorial numbers are defined by 


(n) = n(n —1)---(n-—k +1) 


Formal Power Series 


We begin with a few remarks concerning formal power series. Let F denote the 
algebra of formal power series in the variable t, with complex coefficients. 
Thus, F is the set of all formal sums of the form 


f(t)= Sat! (19.1) 
k=0 


where ap E€ C (the complex numbers). Addition and multiplication are purely 
formal: 


CO 


D apt" + 5 bt" = 5 (ap + b,)t* 
k=0 k=0 k=0 
and 
oe) co oe) k 
o ant") $2 bt") = 5 D ajbr-j) tt 
k=0 k=0 k=0 J=0 


The order o(f) of f is the smallest exponent of t that appears with a nonzero 
coefficient. The order of the zero series is defined to be + oo. Note that a series 


472 Advanced Linear Algebra 


f has a multiplicative inverse, denoted by f~t, if and only if o(f) =0. We 
leave it to the reader to show that 


o( fg) = o(f) + o(g) 
and 
o(f +g) = min{o(f), o(g)} 


If fx is a sequence in F with o( fp) — oo as k — 0, then for any series 


g(t) = D2 dat 
k=0 


we may substitute f, for t to get the series 
h(t) = X` be felt) 
k=0 


which is well-defined since the coefficient of each power of t is a finite sum. In 
particular, if o( f) > 1, then o(f*) — oo and so the composition 


(go f)(t) = FE) = Do bf") 
k=0 
is well-defined. It is easy to see that o(g o f) = o(g)o(f). 


If o(f) = 1, then f has a compositional inverse, denoted by f and satisfying 
(fo f(t) = (fof) =t 


A series f with o( f) = 1 is called a delta series. 


The sequence of powers f of a delta series f forms a pseudobasis for F, in the 
sense that for any g € F, there exists a unique sequence of constants ay for 
which 


g(t) = So anf O 
k=0 
Finally, we note that the formal derivative of the series (19.1) is given by 
f(t) = f(t) = So kart" 
k=1 


The operator ô; is a derivation, that is, 


(fg) = A(f)g + fag) 


The Umbral Calculus 473 


The Umbral Algebra 


Let P = C[a] denote the algebra of polynomials in a single variable x over the 
complex field. One of the starting points of the umbral calculus is the fact that 
any formal power series in F can play three different roles: as a formal power 
series, as a linear functional on P and as a linear operator on P. Let us first 
explore the connection between formal power series and linear functionals. 


Let P* denote the vector space of all linear functionals on P. Note that P* is the 
algebraic dual space of P, as defined in Chapter 2. It will be convenient to 
denote the action of L € P* on p(x) € P by 


(L | p(x)) 


(This is the “bra-ket” notation of Paul Dirac.) The vector space operations on P* 
then take the form 


(L+ M | p(x)) = (L | p(x)) + (M | p(z)) 
and 
(rL | p(x)) =r(L | p(z)), rec 


Note also that since any linear functional on P is uniquely determined by its 
values on a basis for P, the functional L € P* is uniquely determined by the 
values (L | x”) for n > 0. 


Now, any formal series in F can be written in the form 


and we can use this to define a linear functional f(t) by setting 
(f(t) | £”) = an 


for n > 0. In other words, the linear functional f(t) is defined by 


where the expression f(t) on the left is just a formal power series. Note in 
particular that 


aë |x") = nl6n 
where ôn is the Kronecker delta function. This implies that 
(t | p(w)) = p™ (0) 


and so t* is the functional “kth derivative at 0.” Also, ¢ is evaluation at 0. 


474 Advanced Linear Algebra 


As it happens, any linear functional L on P has the form f(t). To see this, we 
simply note that if 


jy = Sole 


! 
k=0 k! 


then 
(fr(@) | 2”) = (L | 2”) 


for all n > 0 and so as linear functionals, L = fz (t). 
Thus, we can define a map ¢: P* — F by ¢(L) = f(t). 


Theorem 19.1 The map ġ: P* — F defined by (L) = f(t) is a vector space 
isomorphism from P* onto F. 
Proof. To see that ¢ is injective, note that 


filt) = fut) > (L | 2") = (M | xz”) for all n > 0 > L=M 


Moreover, the map ¢ is surjective, since for any f € F, the linear functional 
L = f(t) has the property that ¢(L) = f,(t) = f(t). Finally, 


© L+ sM lak), 
o(rL + sM) = ea 
k=0 ` 
R (L|) e (M |a") y 
n? iil t PÈ t 


= r9(L) + s$(M) z 


From now on, we shall identify the vector space P* with the vector space F, 
using the isomorphism ġ: P* — F. Thus, we think of linear functionals on P 
simply as formal power series. The advantage of this approach is that F is more 
than just a vector space—it is an algebra. Hence, we have automatically defined 
a multiplication of linear functionals, namely, the product of formal power 
series. The algebra F, when thought of as both the algebra of formal power 
series and the algebra of linear functionals on P, is called the umbral algebra. 


Let us consider an example. 


Example 19.1 For a € C, the evaluation functional c, € P* is defined by 


(€a | p(#)) = pla) 


The Umbral Calculus 475 


In particular, (€a | £”) = a” and so the formal power series representation for 
this functional is 


which is the exponential series. If e* is evaluation at b, then 


ett elt = elatbyt 


and so the product of evaluation at a and evaluation at b is evaluation at 
a+b.0 


When we are thinking of a delta series f € F as a linear functional, we refer to 
it as a delta functional. Similarly, an invertible series f € F is referred to as an 
invertible functional. Here are some simple consequences of the development 
so far. 


Theorem 19.2 
1) Foran f € Ff, 


=r k! 
2) Foran p € P, 
tt p(x 
k>0 : 


3) Forany f,gEF, 
(F090 | 2") =D “L la) 
5) Hoth) <hporal k= iter 
(Daro) = I ex flO) |) 


where the sum on the right is a finite one. 
6) olfr) =k for all k > 0, then 


(F(t) | p(x)) = (FCE) | a(x) for all k > 0 = p(x) = q(x) 
7) Ifdeg p(x) = k for all k > 0, then 
(F(t) | pe(@)) = (g(t) | pe(x)) for all k > 0 = f(t) = g(t) 


476 Advanced Linear Algebra 


Proof. We prove only part 3). Let 


Then 
IOD = (TAC )aubwa)e™ 


and applying both sides of this (as linear functionals) to x” gives 


(Fa) La") = F (P Jardu 


k=0 


The result now follows from the fact that part 1) implies ap = (f(t) | z®} and 
bne = (g(t) | z”). 


We can now present our first “umbral” result. 


Theorem 19.3 For any f(t) € F and p(x) € P, 
(F(t) | ep(x)) = (OF (4) | p(@)) 


Proof. By linearity, we need only establish this for p(x) = x”. But if 


then 


= D =r Ok—-1n 
= Gn4+1 
= (f(t) | a") Oo 


Let us consider a few examples of important linear functionals and their power 
series representations. 


Example 19.2 
1) We have already encountered the evaluation functional e“, satisfying 


(e | p(x)) = pla) 


The Umbral Calculus 477 


2) The forward difference functional is the delta functional e“ — 1, 
satisfying 


(e" — 1 | p(x)) = pla) — p(0) 
3) The Abel functional is the delta functional te”, satisfying 
(te“ | p(x)) = p'(a) 
4) The invertible functional (1 — t)~! satisfies 
(1-H | p(z)) E. p(uje “du 
0 
as can be seen by setting p(x) = 2” and expanding the expression 


(1= t). 
5) To determine the linear functional f satisfying 


Gale = f “pli ds 
we observe that 


< (FOTS a Ca ak eat®—-1 
f=), k! O 1 


© 


The inverse t/(e% — 1) of this functional is associated with the Bernoulli 
polynomials, which play a very important role in mathematics and its 
applications. In fact, the numbers 

a”) 


t 
By ={ = 
ett — 1 


are known as the Bernoulli numbers. O 


Formal Power Series as Linear Operators 


We now turn to the connection between formal power series and linear 
operators on P. Let us denote the kth derivative operator on P by t*. Thus, 


t*p(x) = p™ (a) 


We can then extend this to formal series in t, 


Fi) = ot (19.2) 
k=0 


478 Advanced Linear Algebra 


by defining the linear operator f(t): P — P by 


the latter sum being a finite one. Note in particular that 


n 


fe" =F ae (19.3) 


k=0 


With this definition, we see that each formal power series f € F plays three 
roles in the umbral calculus, namely, as a formal power series, as a linear 
functional and as a linear operator. The two notations (f(t) | p(a)) and 
f (t)p(x) will make it clear whether we are thinking of f as a functional or as an 
operator. 


It is important to note that f = g in F if and only if f = g as linear functionals, 
which holds if and only if f = g as linear operators. It is also worth noting that 


OHE) = FOl) 
and so we may write f (t)g(t)p(x) without ambiguity. In addition, 
f)9O)p@) = gt) FOr) 
for all f,g € F and p € P. 
When we are thinking of a delta series f as an operator, we call it a delta 


operator. The following theorem describes the key relationship between linear 
functionals and linear operators of the form f(t). 


Theorem 19.4 If f, g € F, then 


(FOE) | p) = (FE | 9) p@)) 


for all polynomials p(x) € P. 
Proof. If f has the form (19.2), then by (19.3), 


(E | FE") = EEC Jaat *) = an= (FOL (194 


By linearity, this holds for xz” replaced by any polynomial p(x). Hence, 
applying this to the product fg gives 
FOIE) | pE) = (| FOIE) O 
= (1° | FOORE) = FE) | 9(¢)p(2)) 


Equation (19.4) shows that applying the linear functional f(t) is equivalent to 
applying the operator f(t) and then following by evaluation at x = 0. 


The Umbral Calculus 479 


Here are the operator versions of the functionals in Example 19.2. 


Example 19.3 
1) The operator e“ satisfies 


o0 k 
at nn Q kon c] k „n—k n 
= —t = £ =(r+a 
eg 3 il x > k a ( ) 
and so 


e“ p(x) = p(x + a) 


for all p € P. Thus e“ is a translation operator. 
2) The forward difference operator is the delta operator e — 1, where 


(e" — 1)p(z) = p(x + a) — p(a) 
3) The Abel operator is the delta operator te”, where 
te“ p(x) = p'(x + a) 


4) The invertible operator (1 — t)~! satisfies 


(1 —t)‘'p(z) = [ ve +u)e “du 


5) The operator (e“’ — 1)/t is easily seen to satisfy 


et —] x+a 
= pa) =f plu) du o 


We have seen that all linear functionals on P have the form f(t), for f € F. 
However, not all linear operators on P have this form. To see this, observe that 


deg [f (t)p(x)] < deg p(x) 
but the linear operator ¢:P — P defined by ¢(p(x)) = xp(x) does not have 
this property. 


Let us characterize the linear operators of the form f(t). First, we need a lemma. 


Lemma 19.5 Jf T is a linear operator on P and T f(t) = f(t)T for some delta 
series f(t), then deg(Tp(a)) < deg(p(a)). 
Proof. For any m > 0, 


deg(Tx™) — 1 = deg( f (t)Tx™) = deg(Tf(t)2”) 


and so 


480 Advanced Linear Algebra 


deg(Tx") = deg(T f (t)x™”) +1 


Since deg(f(t)a’™) =m — 1 we have the basis for an induction. When m = 0 
we get deg(T'1) = 1. Assume that the result is true for m — 1. Then 


deg(T'x”) = deg(T f (t)z) +1<m-1+l=m oO 


Theorem 19.6 The following are equivalent for a linear operator T: P — P. 

I) T has the form f(t), that is, there exists an f € F for which T = f(t), as 
linear operators. 

2) T commutes with the derivative operator, that is, Tt = tT. 

3) T commutes with any delta operator g(t), that is, Th(t) = h(t)T. 

4) T commutes with any translation operator, that is, Te = e“T. 

Proof. It is clear that 1) implies 2). For the converse, let 


y= lt) (t | a 
k=0 


Then 
(g(t) | 2") = (t | Tz*) 

Now, since T commutes with t, we have 

(t | Tz") = (2° ETa" p 
= (7 | Tt x ki 
= (kn? | Ta") 
= (k)n (t | g(t)a*) 
= (t” | g(t)x") 


and since this holds for all n and k we get T = g(t). We leave the rest of the 
proof as an exercise. O 


Sheffer Sequences 


We can now define the principal object of study in the umbral calculus. When 
referring to a sequence s„(x) in P, we shall always assume that deg s„(£) = n 
for all n > 0. 


Theorem 19.7 Let f be a delta series, let g be an invertible series and consider 
the geometric sequence 


g, gf gP gf’. 


in F. Then there is a unique sequence sn(x) in P satisfying the orthogonality 
conditions 


The Umbral Calculus 481 


(g(t) f(t) | sn(x)) = n!ôn g (19.5) 


forall n,k > 0. 
Proof. The uniqueness follows from Theorem 19.2. For the existence, if we set 


Salz) = > an je ji 
j=0 
and 
alt) f*(t) = So dat! 
i=k 
where by, ;, A 0, then (19.5) is 
nn k = > brit 5 an jZ) 
i=k j=0 


= 5 >D by, stn s(t | 3) 


i=k j=0 
n 
= bi, init! 
i=k 
Taking k = n we get 
1 
an,n a G 
bn n 


For k = n — 1 we have 
0= bn—1n—14nn—1(N = 1)! a bn—1,nAnnM! 


and using the fact that a,,,=1/b,,, we can solve this for a,,-1. By 
successively taking k=n,n—1,n—2,... we can solve the resulting 
equations for the coefficients a, ,, of the sequence s,,(2).0 


Definition The sequence s,(a) in (19.5) is called the Sheffer sequence for the 
ordered pair (g(t), f(t)). We shorten this by saying that s,(x) is Sheffer for 
(g(t), f(@)).O 


Two special types of Sheffer sequences deserve explicit mention. 


Definition The Sheffer sequence for a pair of the form (1, f(t)) is called the 
associated sequence for f(t). The Sheffer sequence for a pair of the form 
(g(t), t) is called the Appell sequence for g(t).O 


482 Advanced Linear Algebra 


Note that the sequence s,,(a) is Sheffer for (g(t), f (t)) if and only if 
(gE) f *(t) | Sn(@)) = n!ôn,k 
which is equivalent to 


(FEE) | 9(t)sn(x)) = nln, 


which, in turn, is equivalent to saying that the sequence p,(x) = g(t)sn(£) is 
the associated sequence for f(t). 


Theorem 19.8 The sequence s,(a) is Sheffer for (g(t), f(t)) if and only if the 
sequence p(x) = g(t)sn(x) is the associated sequence for f(t).O 


Before considering examples, we wish to describe several characterizations of 
Sheffer sequences. First, we require a key result. 


Theorem 19.9 (The expansion theorems) Let s,,(x) be Sheffer for (g(t), f (t)). 
1) Foranyhe fF, 


= MOTO) ain) Fe 
k=0 
2) Foranyp €P, 
£ x 
Ha) = EONO Lo 


Proof. Part 1) follows from Theorem 19.2, since 


(BOLO) Gea pK(0 sn(z)) = DEO MD ton 
E ! k=0 i 


= (A(t) | $n(x)) 


Part 2) follows in a similar way from Theorem 19.2.0 


We can now begin our characterization of Sheffer sequences, starting with the 
generating function. The idea of a generating function is quite simple. If r,,(a) is 
a sequence of polynomials, we may define a formal power series of the form 


This is referred to as the (exponential) generating function for the sequence 
rp(x). (The term exponential refers to the presence of k! in this series. When 
this is not present, we have an ordinary generating function.) Since the series is 
a formal one, knowing g(t, x) is equivalent (in theory, if not always in practice) 


The Umbral Calculus 483 


to knowing the polynomials r„(x). Moreover, a knowledge of the generating 
function of a sequence of polynomials can often lead to a deeper understanding 
of the sequence itself, that might not be otherwise easily accessible. For this 
reason, generating functions are studied quite extensively. 


For the proofs of the following characterizations, we refer the reader to Roman 
[1984]. 


Theorem 19.10 (Generating function) 
I) The sequence p,(x) is the associated sequence for a delta series f(t) if and 
only if 


where f (t) is the compositional inverse of f (t). 
2) The sequence s,,(x) is Sheffer for (g(t), f(t)) ifand only if 


The sum on the right is called the generating function of sn (£). 
Proof. Part 1) is a special case of part 2). For part 2), the expression above is 
equivalent to 


l out = sk(y) k 
PO 23 Pe) 


which is equivalent to 


But if s„(x) is Sheffer for (f(t), g(t)), then this is just the expansion theorem 
for e”*. Conversely, this expression implies that 


sn() = (e¥ | sa(a)) = > a 7*(@) | 5n(2)) 


and so (g(t) f*(t) | s,(x)) =n!6,, which says that s,(x) is Sheffer for 
(f,g).0 


We can now give a representation for Sheffer sequences. 


Theorem 19.11 (Conjugate representation) 


484 Advanced Linear Algebra 


I) A sequence p,(x) is the associated sequence for f(t) if and only if 
n i = i : 
pma) = DETO | 2") 
k=0 
2) A sequence s,,(x) is Sheffer for (g(t), f(t)) ifand only if 


sale) = alo FO) "FH |") 
io" 


Proof. We need only prove part 2). We know that s,,(a) is Sheffer for 
(g(t), f (t)) if and only if 


1 els (t) < si (y) k 


IF) 2 fi 


But this is equivalent to 


Fo ja") = os mee") = su) 


Expanding the exponential on the left gives 


UTOTO a (g 


! 
k=0 k! k=0 


Replacing y by x gives the result. O 
Sheffer sequences can also be characterized by means of linear operators. 


Theorem 19.12 (Operator characterization) 

I) A sequence p(x) is the associated sequence for f(t) if and only if 
a) Pr (0) = On,0 
b) f(t)pala) = npa la) forn > 0 

2) A sequence s(x) is Sheffer for (g(t), f (t)) for some invertible series g(t) if 
and only if 


f(t) $n(@) = nsn- (2) 


for alln > 0. 
Proof. For part 1), if p,(a) is associated with f(t), then 


Pr(O) = (e™ | Pr(z)) = (f(t)? | Pn(x)) = 0!ôn,0 


and 


The Umbral Calculus 485 


(OF | F@)pn(x)) = (FO | pal) 
= n!ôn, ¿+1 
= n(n ie 1)!ón-1,k 


= n(f(t)* | Pn-1(2)) 


and since this holds for all k > 0 we get 1b). Conversely, if la) and 1b) hold, 
then 


(FOE | pa(x)) = (E | FO)" pn(x)) 
=(n )kPn- (0 ) 
= (n)kôn—k,0 
= nlon te 


and so p,(x) is the associated sequence for f(t). 


As for part 2), if sn (x) is Sheffer for (g(t), f (t)), then 
UOLO | F(t) 5n(x)) = (GHFO"™ | sn(x)) 


= Mn bed 
= n(n = 1)!ón-1,k 


= n (g(t) F(t)" | sn-1(2)) 
and so f(t)sn(£) = nsn-ı(x), as desired. Conversely, suppose that 
F(t) 8n(@) = nsn- (2) 


and let p(x) be the associated sequence for f(t). Let T be the invertible linear 
operator on V defined by 


T5n(2) = pn(2) 
Then 
T f(t) n(x) = nT sp1(x) = npn- (2) = f(t)Pn(x) = f(T sn(2) 
and so Lemma 19.5 implies that T = g(t) for some invertible series g(t). Then 


HSO" | sn(x)) = (FE | g@)sn(2)) 
(t | f)*pn(z)) 
= 

=(n 


N)kPn— k(O ) 
)kÓn—k,0 
= nlôn k 


O 


and so s,,(a) is Sheffer for (g(t), f(t)). 


We next give a formula for the action of a linear operator h(t) on a Sheffer 
sequence. 


486 Advanced Linear Algebra 


Theorem 19.13 Let s,(x) be a Sheffer sequence for (g(t), f(t)) and let p,(x) 
be associated with f(t). Then for any h(t) we have 


n 


hsna) = XÒ) (h(t) | se(2))Pn-x(@) 


k=0 


Proof. By the expansion theorem 


we have 


which is the desired formula. O 


Theorem 19.14 
1) (The binomial identity) 4 sequence p,(x) is the associated sequence for a 
delta series f(t) if and only if it is of binomial type, that is, if and only if it 
satisfies the identity 
n n 
Pr(z+y) = 5 ( > Pk (Y) Pn—k (2) 
k=0 
forall y € C. 
2) (The Sheffer identity) A sequence s(x) is Sheffer for (g(t), f(t)) for 
some invertible g(t) if and only if 


n 


Sn(£ +y) = 5 (2) Pr(Y)$n—K(@) 


k=0 


forall y € C, where p(x) is the associated sequence for f(t). 
Proof. To prove part 1), if p,(x) is an associated sequence, then taking 
h(t) = e” in Theorem 19.13 gives the binomial identity. Conversely, suppose 
that the sequence p(x) is of binomial type. We will use the operator 
characterization to show that p„(x) is an associated sequence. Taking 
x = y = 0 we have for n = 0, 


po(0) = po(0)po(0) 
and so pọ(0) = 1. Also, 


The Umbral Calculus 487 


pı (0) = po(0)p1(0) + pı (0)po(0) = 2p1(0) 
and so pı (0) = 0. Assuming that p;(0) = 0 for i = 1,...,m — 1 we have 
Pm(0) = po(0)Pm(O) + Pm(0)po(0) = 2pm(0) 
and so pm(0) = 0. Thus, p,,(0) = 6n,0. 


Next, define a linear functional f(t) by 
(f(t) | Pn(2)) = ôn, 


Since (f(t) | 1) = (F(t) | po(z)) =0 and (f(t) | pi(z)) =1#0 we deduce 
that f(t) is a delta series. Now, the binomial identity gives 


n 
n 


(F(E) mle) = JOC) ROE | pale) 


k=0 


= 5 ( a Pi(Y)On—K,1 
k=0 


= NPn-1(Y) 
and so 
(e | F(t)pn(z)) = (e | nPna(2)) 
and since this holds for all y, we get f(t)p,(x) = npn-ı(x£). Thus, p,(a) is the 


associated sequence for f(t). 


For part 2), if s„(x) is a Sheffer sequence, then taking h(t) = e” in Theorem 
19.13 gives the Sheffer identity. Conversely, suppose that the Sheffer identity 
holds, where p,(x) is the associated sequence for f(t). It suffices to show that 
g(t)S,(a) = p,(x) for some invertible g(t). Define a linear operator T by 


Tsp(x) = Dn(&) 
Then 
OTs a) = e” pn) = p(s + y) 
and by the Sheffer identity, 
n n n n 
Teat OTa) aE] 
k=0 k=0 


and the two are equal by part 1). Hence, T commutes with e”* and is therefore 
of the form g(t), as desired.O 


488 Advanced Linear Algebra 


Examples of Sheffer Sequences 


We can now give some examples of Sheffer sequences. While it is often a 
relatively straightforward matter to verify that a given sequence is Sheffer for a 
given pair (g(t), f(t)), it is quite another matter to find the Sheffer sequence for 
a given pair. The umbral calculus provides two formulas for this purpose, one of 
which is direct, but requires the usually very difficult computation of the series 
(f(t)/t)~". The other is a recurrence relation that expresses each s,,(a) in terms 
of previous terms in the Sheffer sequence. Unfortunately, space does not permit 
us to discuss these formulas in detail. However, we will discuss the recurrence 
formula for associated sequences later in this chapter. 


Example 19.4 The sequence p,,(x) = x” is the associated sequence for the delta 
series f(t) = t. The generating function for this sequence is 


and the binomial identity is the well-known binomial formula 


n 


(x +y)" = wee) aëy 


k=0 
Example 19.5 The lower factorial polynomials 
(x), = x(a —1)---(w@ —n +1) 
form the associated sequence for the forward difference functional 
ftj)=e-1 


discussed in Example 19.2. To see this, we simply compute, using Theorem 
19.12. Since (0)p is defined to be 1, we have (0), = 6n,9. Also, 


(e — 1)(@)n = (£ + 1)n — (£)n 
= |(x + l)z(x — 1)---(@ — n + 2)] — [x(a —1)--- (£ -n + 1)] 
= z(z—1)-(z-n+2)|(e+1)- (r-n +1) 
=ng(x— 1) --(x— n+ 2) 


= n(£)n-1 


The generating function for the lower factorial polynomials is 


ylog(1+t) __ y (y)k k 
e = ——t 


k=0 


The Umbral Calculus 489 


which can be rewritten in the more familiar form 


Of course, this is a formal identity, so there is no need to make any restrictions 
on t. The binomial identity in this case is 


n 


tyh =D h, kUn 


k=0 
which can also be written in the form 

r+ “(2 

O49) = F(2)(,",) 

This is known as the Vandermonde convolution formula. 
Example 19.6 The Abel polynomials 

A,(x;a) = x(x — an)" 
form the associated sequence for the Abel functional 

f(t) = te” 


also discussed in Example 19.2. We leave verification of this to the reader. The 
generating function for the Abel polynomials is 
eTO — ye — ak)" i 


l 
k=0 k! 


Taking the formal derivative of this with respect to y gives 


= : ~ k(y-— a)y- ak)! 
yf(t) — k 
oer = a t 
k=0 
which, for y = 0, gives a formula for the compositional inverse of the series 


f(t) = te”, 


ER oo —a kkk-1 f 
Fo => “a t 


Example 19.7 The famous Hermite polynomials H,„(x) form the Appell 
sequence for the invertible functional 


g(t) =e 


490 Advanced Linear Algebra 


We ask the reader to show that s,,(x) is the Appell sequence for g(t) if and only 
if s,(x) = g(t) tx”. Using this fact, we get 


k! 


The generating function for the Hermite polynomials is 


< AY) p 


——t 
| 
k=0 k! 


eut-P/2 — 


and the Sheffer identity is 


n 


Hale +y) =X (let 


k=0 


We should remark that the Hermite polynomials, as defined in the literature, 
often differ from our definition by a multiplicative constant. O 


Example 19.8 The well-known and important Laguerre polynomials Le (x) 
of order a form the Sheffer sequence for the pair 


t 
t= (1t), ft) = 
g(t) = 1-9, fe = 
It is possible to show (although we will not do so here) that 


n 


a) oe (2*"\ a) 


k=0 
The generating function of the Laguerre polynomials is 


~ LO) (a) k 


1 
edt/(t-1) = k t i 
gieo = He 


As with the Hermite polynomials, some definitions of the Laguerre polynomials 
differ by a multiplicative constant. O 


We presume that the few examples we have given here indicate that the umbral 
calculus applies to a significant range of important polynomial sequences. In 
Roman [1984], we discuss approximately 30 different sequences of polynomials 
that are (or are closely related to) Sheffer sequences. 


Umbral Operators and Umbral Shifts 


We have now established the basic framework of the umbral calculus. As we 
have seen, the umbral algebra plays three roles: as the algebra of formal power 
series in a single variable, as the algebra of all linear functionals on P and as the 


The Umbral Calculus 491 


algebra of all linear operators on P that commute with the derivative operator. 
Moreover, since F is an algebra, we can consider geometric sequences 


Org) ai A js 


in F, where o(g) =0 and o(f) =1. We have seen by example that the 
orthogonality conditions 


(g(t) F *(t) | sn(x)) = nn, 

define important families of polynomial sequences. 

While the machinery that we have developed so far does unify a number of 
topics from the classical study of polynomial sequences (for example, special 
cases of the expansion theorem include Taylor's expansion, the Euler— 
MacLaurin formula and Boole's summation formula), it does not provide much 
new insight into their study. Our plan now is to take a brief look at some of the 
deeper results in the umbral calculus, which center on the interplay between 


operators on P and their adjoints, which are operators on the umbral algebra 
F=P*, 


We begin by defining two important operators on P associated with each 
Sheffer sequence. 


Definition Let s,(x) be Sheffer for (g(t), f(t)). The linear operator 
Ag,t: P — P defined by 


Ag ple") = Sn (T) 


is called the Sheffer operator for the pair (g(t), f(t)), or for the sequence 
$n (x). If pp (x) is the associated sequence for f(t), the Sheffer operator 
) 


Af (x”) = Pr(z 


is called the umbral operator for f(t), or for p,(x).0 


Definition Let s(x) be Sheffer for (g(t), f(t)). The linear operator 
0; f: P — P defined by 


95,7 [$n(a)] = sn (x) 


is called the Sheffer shift for the pair (g(t), f(t)), or for the sequence s,(a). If 
n(x) is the associated sequence for f(t), the Sheffer operator 


OF [Pn(x)] = Pny (x) 
is called the umbral shift for f(t), or for p,(«).O 


492 Advanced Linear Algebra 


It is clear that each Sheffer sequence uniquely determines a Sheffer operator and 
vice versa. Hence, knowing the Sheffer operator of a sequence is equivalent to 
knowing the sequence. 


Continuous Operators on the Umbral Algebra 


It is clearly desirable that a linear operator T on the umbral algebra F pass 
under infinite sums, that is, that 


T(S afle) = Da fel) (19.6) 
k=0 k=0 


whenever the sum on the left is defined, which is precisely when o(f;,(t)) — co 
as k — oo. Not all operators on F have this property, which leads to the 
following definition. 


Definition 4 linear operator T on the umbral algebra F is continuous if it 
satisfies (19.6).0 


The term continuous can be justified by defining a topology on F. However, 
since no additional topological concepts will be needed, we will not do so here. 
Note that in order for (19.6) to make sense, we must have o(T'[f;.(t)]) — oo. It 
turns out that this condition is also sufficient. 


Theorem 19.15 A linear operator T on F is continuous if and only if 
o( fr) > œ = o(T(fk)) > co (19.7) 


Proof. The necessity is clear. Suppose that (19.7) holds and that o( fp) — oo. 
For any m > 0, we have 


(TS ah) 2") = (TY ah) 
a0 k=0 


a") + (TY an fill) 


k>m 


a (19.8) 


Since 
(Zano) — 00 
k>m 
(19.7) implies that we may choose m large enough that 
o Cas axfu(t)) >n 
k>m 
and 


o(T[ fO] > n fork >m 


The Umbral Calculus 493 


Hence, (19.8) gives 


(T5 ak fr (t) 
k=0 


m 


g") = D3 ax f(t) a 
k=0 

z (5 axT[fe(t)] a”) 

z 32 axT [fe(t)] a 


which implies the desired result. O 
Operator Adjoints 


If 7:P — P is a linear operator on P, then its (operator) adjoint 7* is an 
operator on P* = F defined by 


T” [h(t)] = h(t) OT 


In the symbolism of the umbral calculus, this is 
(h(t) | p(x)) = (A(t) | rp(x)) 


(We have reduced the number of parentheses used to aid clarity.) 
Let us recall the basic properties of the adjoint from Chapter 3. 


Theorem 19.16 For T,o € L(P), 
D (r+o)* =7* +0% 
2) (rr)* =rr* foranyr eC 
3) (Ta) =a*r* 
(7~1)* = (r*)7! for any invertible r € L(P) o 


Thus, the map ¢: L(P) — L(F) that sends 7: P — P to its adjoint r*: F —> F 
is a linear transformation from L(P) to L(F). Moreover, since T% = 0 implies 
that (h(t) | rp(x)) = 0 for all A(t) € F and p(x) € P, which in turn implies 
that r = 0, we deduce that @ is injective. The next theorem describes the range 
of @. 


Theorem 19.17 A linear operator T € L(F) is the adjoint of a linear operator 
L € L(P) ifand only if T is continuous. 

Proof. First, suppose that T = 7* for some T € L(P) and let o( f,(t)) — ov. If 
n > 0, then for all 0 < i < n we have 


(7* f(t) | 2") = (SCE) | 72) 


and so it is only necessary to take k large enough that o( f,(t)) > deg r(2’) for 
all 0 < i < n, whence 


494 Advanced Linear Algebra 


(T* fet) | 2") = 0 


for all 0 <i<n and so o(T* f;,(t)) > n. Thus, o(7* fķ(t)) > co and T% is 
continuous. 


For the converse, assume that T is continuous. If T did have the form 7*, then 


‘Te | r”) = Ge | a”) = q" | Tr”) 


and since 
tE | re” 
dep als 
k>0 : 
we are prompted to define T by 
Tt an 
Tr” = ( A ) k 
k>0 ' 


This makes sense since o(Tt*) — oo as k — œo and so the sum on the right is a 
finite sum. Then 
Te |z” l 
(ye | a”) — (t” | ra") — X? ( | T ) a" | xË) — (re | x”) 
k>0 k! 


which implies that Tt” = r*t™ for all m > 0. Finally, since T and 7* are both 
continuous, we have T = r*.0O 


Umbral Operators and Automorphisms of the Umbral Algebra 


Figure 19.1 shows the map ø, which is an isomorphism from the vector space 
L(P) onto the space of all continuous linear operators on F. We are interested 
in determining the images under this isomorphism of the set of umbral operators 
and the set of umbral shifts, as pictured in Figure 19.1. 


The Umbral Calculus 495 


Umbral 
Operators 


Tsomorphism 


Umbral >| Derivations 
Shifts on F 
Continuous linear 
A ) operators ön 7 


Figure 19.1 


Let us begin with umbral operators. Suppose that A; is the umbral operator for 
the associated sequence p„(x), with delta series f(t) € F. Then 


(AF FC" | a") = FO)" | Apa") = (FOE | pa(@)) = nóng = (| 2”) 
for all k and n. Hence, Ax f(t)" = t* and the continuity of A; implies that 
Art = F(t)" 
More generally, for any h(t) E€ F, 
MH) = AFO) (19.9) 


In words, A% is composition by f(t). 


From (19.9), we deduce that A; is a vector space isomorphism and that 


AFORO = (F(t) ACF H) = Af g(DAF H(t) 


Hence, À r is an automorphism of the umbral algebra F. It is a pleasant fact that 


this characterizes umbral operators. The first step in the proof of this is the 
following, whose proof is left as an exercise. 


Theorem 19.18 /f T is an automorphism of the umbral algebra, then T 
preserves order, that is, o(T f (t)) = o(f(t)). In particular, T is continuous. O 


Theorem 19.19 A linear operator X on P is an umbral operator if and only if 
its adjoint is an automorphism of the umbral algebra F. Moreover, if Xf is an 
umbral operator, then 


496 Advanced Linear Algebra 


Af h(t) = hH) 


for all h(t) € F. In particular, AF f(t) = t. 

Proof. We have already shown that the adjoint of Ay is an automorphism 
satisfying (19.9). For the converse, suppose that \* is an automorphism of F. 
Since A* is surjective, there is a unique series f(t) for which A* f(t) = t. 
Moreover, Theorem 19.18 implies that f(t) is a delta series. Thus, 


n!ôn k = (HF | a") = (A FOF | 2") = (FF | A”) 
which shows that Ax” is the associated sequence for f(t) and hence that A is an 


umbral operator. O 


Theorem 19.19 allows us to fill in one of the boxes on the right side of Figure 
19.1. Let us see how we might use Theorem 19.19 to advantage in the study of 
associated sequences. 


We have seen that the isomorphism À ++ * maps the set U of umbral operators 
on P onto the set aut( F) of automorphisms of F = P*. But aut( F) is a group 
under composition. So if 


Af: £” — p(x) and Ag: £” > gn(x) 
are umbral operators, then since 
(Ago às)” = Ap oA 
is an automorphism of F, it follows that the composition A, o Ay is an umbral 
operator. In fact, since 
(Ay 0 Af)” F(E) = AF 0 AX F(G(t)) = AF F(t) =t 
we deduce that Ag © Af = Afog. Also, since 
Ape Apes A pape aad 


we have Az! = Az. 
Thus, the set / of umbral operators is a group under composition with 
Ag © Àf = Afog 
and 
A =A 


Let us see how this plays out with respect to associated sequences. If the 


The Umbral Calculus 497 


associated sequence for f(t) is 


n 
Pn (x) = XO pner" 
k=0 


then Af: 2” — p,(x) and so Afog = Ago Às is the umbral operator for the 
associated sequence 


(Ag 0 Ap)a” = AgPn(Z) = 5 Papaya” = So Pn. de (©) 
k=0 k=0 
This sequence, denoted by 


plal) = J parteto) (19.10) 
k=0 


is called the umbral composition of p,(x) with q,(x). The umbral operator 
AF = rj" is the umbral operator for the associated sequence r(x) = Erp pe’ 
where 


Ap 2” =a) 


and so 


n 


a 5 Tn,k Pk (x) 


k=0 


Let us summarize. 


Theorem 19.20 
1) The setU of umbral operators on P is a group under composition, with 


Ago Àf =Afog and AF! = AF 
2) The set of associated sequences forms a group under umbral composition 
n 
Pn(q(a)) = XO purgla) 
k=0 
In particular, the umbral composition p,(q(x)) is the associated sequence 
for the composition f o g, that is, 
À fog: g” — Pn (q(x)) 


The identity is the sequence x” and the inverse of p,(a) is the associated 
sequence for the compositional inverse f(t). 


498 Advanced Linear Algebra 


3) Let A; E€ U and g(t) € F. Then as operators, 
Apg(t) = Api g(t)Ay 
4) Let Xs EU and g(t) € F. Then 
AIF E) = gAs 
Proof. We prove 3) as follows. For any h(t) € F and p(x) € P, 


which gives the desired result. Part 4) follows immediately from part 3) since À 
is composition by f.O 


Sheffer Operators 
If s,,(x) is Sheffer for (g, f), then the linear operator A, ¢ defined by 
Ag, f(£") = Sn(Z) 


is called a Sheffer operator. Sheffer operators are closely related to umbral 
operators, since if p,,(a) is associated with f(t), then 


Sn(x) = g (t)Pn(x) = g (EA pr” 

and so 
Agf = I OA 

It follows that the Sheffer operators form a group with composition 

Agf © Ank = J (HART (EAk 

= g (EAT (FEAA 
= [OAE] Akos 
), 


= Àg(hof),kof 


and inverse 


From this, we deduce that the umbral composition of Sheffer sequences is a 
Sheffer sequence. In particular, if s„(x) is Sheffer for (g,f) and 
tn(x) = Etn p2" is Sheffer for (h, k), then 


The Umbral Calculus 499 


n 
Àg,f o An k(@”) = 5 tn kAg sT" 
k=0 


= y tn kSk(T) 
k=0 
= ta(s(x)) 
is Sheffer for (g - (ho f),ko f). 
Umbral Shifts and Derivations of the Umbral Algebra 


We have seen that an operator on P is an umbral operator if and only if its 
adjoint is an automorphism of F. Now suppose that 0; € L(P) is the umbral 
shift for the associated sequence p„(x), associated with the delta series 
f(t) € F. Then 


(OF f(t) | Pn(x)) f(t)" | 9¢Pn(x)) 
jf 


= 
= (F(4)" | Pn+i(x)) 
= (n + 1)!őn+1,k 

= k(k — 1)!őnk-1 


= (kf | pa(2)) 
and so 
ofa) = kf (19.11) 
This implies that 
OFO FEY] = FOIE + FOFO (19.12) 
and further, by continuity, that 
9; [g(t)h(t)] = [Ar g(t) ]h(t) + g(t) [47 9) (19.13) 


Let us pause for a definition. 


Definition Let A be an algebra. A linear operator ô on A is a derivation if 
O(ab) = (Oa)b + adb 
foralla,b € A.O 


Thus, we have shown that the adjoint of an umbral shift is a derivation of the 
umbral algebra F. Moreover, the expansion theorem and (19.11) show that OF 
is surjective. This characterizes umbral shifts. First we need a preliminary result 
on surjective derivations. 


500 Advanced Linear Algebra 


Theorem 19.21 Let ð be a surjective derivation on the umbral algebra F. Then 
Oc = 0 for any constant c € F and o(Of(t)) = o( f(t)) — 1, if o(f(t)) > 1. In 
particular, 0 is continuous. 

Proof. We begin by noting that 


01=01? = ð1 + 1 = 2ð1 


and so 0c = cO1 = 0 for all constants c € F. Since O is surjective, there must 
exist an h(t) € F for which 


Oh(t) =1 
Writing h(t) = ho + thi (t), we have 
1 = [ho + thi(£)] = (Ot)hi(t) + tohi(t) 
which implies that o(0t) = 0. Finally, if o(h(t)) = k > 1, then h(t) = t*hi(t), 
where o(h;(t)) = 0 and so 
o[Oh(t)] = o[Ot*hy(t)] = oft*Oh(t) + kt* th, (t)ðt] = k — 1 Oo 


Theorem 19.22 A linear operator 0 on P is an umbral shift if and only if its 
adjoint is a surjective derivation of the umbral algebra F. Moreover, if 07 is an 
umbral shift, then 0% = Oy is derivation with respect to f (t), that is, 


0% FEE =k" 


for all k > 0. In particular, 0% f (t) = 1. 
Proof. We have already seen that 0% is derivation with respect to f (t). For the 


converse, suppose that 0* is a surjective derivation. Theorem 19.21 implies that 
there is a delta functional f(t) such that 0* f(t) = 1. If p,(a) is the associated 
sequence for f(t), then 


(f(t) | Opn(2)) = 


Hence, Op, (@) = pny (x), that is, 0 = 0p is the umbral shift for p,(x).O 


We have seen that the fact that the set of all automorphisms on F is a group 
under composition shows that the set of all associated sequences is a group 
under umbral composition. The set of all surjective derivations on F does not 
form a group. However, we do have the chain rule for derivations! 


The Umbral Calculus 501 


Theorem 19.23 (The chain rule) Let 0; and 0, be surjective derivations on F. 
Then 


Og = (Ogf (t)) OF 
Proof. This follows from 
FOE = k f(t)" OFE) = (Ogf (t)) OF ()" 


and so continuity implies the result. O 
The chain rule leads to the following umbral result. 


Theorem 19.24 If 0; and 0, are umbral shifts, then 
Oy = Os 0 Og f(t) 
Proof. Taking adjoints in the chain rule gives 
Og = OF 0 (Of E)” = Of © Og f(t) Oo 


We leave it as an exercise to show that 0, f(t) = [O;g(t)|"'. Now, by taking 
g(t) =t in Theorem 19.24 and observing that #2” = x”+! and so @ is 
multiplication by x, we get 


OF = xOrt = claf] = rif O 


Applying this to the associated sequence p,(x) for f(t) gives the following 
important recurrence relation for p(x). 


Theorem 19.25 (The recurrence formula) Let p(x) be the associated 
sequence for f(t). Then 

1) Payla) = a[f!()] pE) 

2) Palt) = rA fE E 

Proof. The first part is proved. As to the second, using Theorem 19.20 we have 


Prsi() = sif O phle) O 
sif O Ara” 

Sol ONEA 
= sA f'e” 


Example 19.9 The recurrence relation can be used to find the associated 
sequence for the forward difference functional f(t) = e* — 1. Since f'(t) = é, 
the recurrence relation is 


Dn+1(2) = xre™ p(z) = LP (x _ 1) 


502 Advanced Linear Algebra 


Using the fact that pọ(x) = 1, we have 
pi() = x, po(x) = x(x — 1), p(x) = x(x — 1)(a — 2) 
and so on, leading easily to the lower factorial polynomials 
Pn(x) = x(x —1)---(@—n+1) = (a)n O 
Example 19.10 Consider the delta functional 
f(t) = log(1 +t) 


Since f(t) = et — 1 is the forward difference functional, Theorem 19.20 implies 
that the associated sequence ¢,(xz) for f(t) is the inverse, under umbral 
composition, of the lower factorial polynomials. Thus, if we write 


Palt) = D S(n,k)a* 
k=0 


then 


n 


x” = > Soha 


The coefficients S (n, k) in this equation are known as the Stirling numbers of 
the second kind and have great combinatorial significance. In fact, S(n,k) is 
the number of partitions of a set of size n into k blocks. The polynomials ¢,, (x) 
are called the exponential polynomials. 


The recurrence relation for the exponential polynomials is 
Pnl) = &(1 + t)bn(x) = o(Gn(x) + $,(2)) 


Equating coefficients of «* on both sides of this gives the well-known formula 
for the Stirling numbers 


S(n+1,k) = S(n,k — 1) + kS(n,k) 


Many other properties of the Stirling numbers can be derived by umbral 
means. O 


Now we have the analog of part 3) of Theorem 19.20. 


Theorem 19.26 Let 0; be an umbral shift. Then 
A; g(t) = g(t)Oy — Argl) 


The Umbral Calculus 503 


Proof. We have 
(FEE) | Ož 9(t)pr(z)) = 
] — 9(t)6F¢ FFE) | pn(x)) 
] | pn(a)) — (kg) f°" E) | pn(x)) 
)0¢Pn(x)) — (KFE) | g(t) pn(a)) 
(t)0 fpa (2)) — C7 FEE) | g(t)pn(x)) 
) )) — CF) | O79) pal) 


from which the result follows.O 


If f(t) = t, then Opis multiplication by x and 8% is the derivative with respect to 
t and so the previous result becomes 


g' (t) = g(t) — xg(t) 


as operators on P. The right side of this is called the Pincherle derivative of 
the operator g(t). (See [104].) 


Sheffer Shifts 
Recall that the linear map 
99,¢[Sn(@)] = Sny (x) 
where s(x) is Sheffer for (g(t), f(t)) is called a Sheffer shift. If p,(x) is 
associated with f(t), then g(t)s,(%) = p,(a) and so 
T(E) Pasi (E) = Ogs (Epa lE) 
and so 
Ogs = IT (Org) 
From Theorem 19.26, the recurrence formula and the chain rule, we have 
Oys = J> (t)Org(t) 

=g'(t = (t)Os — OF g(t)] 

= 0s — g *(t)d;9(t) 

al ae) OAI 

= 0p — g` (t)ðrtðg(t) 

= a[f’ O ‘= TOOTE 


-F49 


We have proved the following. 


504 Advanced Linear Algebra 


Theorem 19.27 Let 6, ¢ be a Sheffer shift. Then 
= E) 

es |e g a FO 

2) snl) = E = — a sn (x) o 


The Transfer Formulas 
We conclude with a pair of formulas for the computation of associated 


sequences. 


Theorem 19.28 (The transfer formulas) Let p,,(a) be the associated sequence 


for f(t). Then 

‘ =m] 
D plz) = rO) 
2) Pn(x )= = x() ~~ 
Proof. First we show that 1) and 2) are equivalent. Write g(t) = f(t)/t. Then 


PO e = [tg(t)]'9(t)-" 2" 
Nan + ig! (t \g(t ig 
a Ja n= fad 1 


n 


8 


To prove 1), we verify the operation conditions for an associated sequence for 
the sequence g,(x) = f’(t)g(t)-"" !a”. First, when n > 1 the fourth equality 
above gives 


(Ha — [g ae) 
= (ge) | 2") — (HO | 2”) 
= (g(t) ™ | 2") = (g(t) ™ | a” 
=0 


If n = 0, then (t° | g,(x)) = 1, and so in general, we have (t° | qn(2)) = 6n,0 as 
required. 


For the second required condition, 
f(t)an(a) = FF (g(t) r 
= tg(t) F EA +2” 
=nf ou)” 2 
a Ngn- (x) 


Thus, qn(x) is the associated sequence for f(¢).0 


The Umbral Calculus 505 


A Final Remark 


Unfortunately, space does not permit a detailed discussion of examples of 
Sheffer sequences nor the application of the umbral calculus to various classical 
problems. In [105], one can find a discussion of the following polynomial 
sequences: 


The lower factorial polynomials and Stirling numbers 
The exponential polynomials and Dobinski's formula 
The Gould polynomials 

The central factorial polynomials 

The Abel polynomials 

The Mittag-Leffler polynomials 

The Bessel polynomials 

The Bell polynomials 

The Hermite polynomials 

The Bernoulli polynomials and the Euler-MacLaurin expansion 
The Euler polynomials 

The Laguerre polynomials 

The Bernoulli polynomials of the second kind 

The Poisson—Charlier polynomials 

The actuarial polynomials 

The Meixner polynomials of the first and second kinds 
The Pidduck polynomials 

The Narumi polynomials 

The Boole polynomials 

The Peters polynomials 

The squared Hermite polynomials 

The Stirling polynomials 

The Mahler polynomials 

The Mott polynomials 


and more. In [105], we also find a discussion of how the umbral calculus can be 
used to approach the following types of problems: 


The connection constants problem 

Duplication formulas 

The Lagrange inversion formula 

Cross sequences 

Steffensen sequences 

Operational formulas 

Inverse relations 

Sheffer sequence solutions to recurrence relations 
Binomial convolution 


506 Advanced Linear Algebra 


Finally, it is possible to generalize the classical umbral calculus that we have 
described in this chapter to provide a context for studying polynomial sequences 
such as those of the names Gegenbauer, Chebyshev and Jacobi. Also, there is a 
q-version of the umbral calculus that involves the q-binomial coefficients (also 
known as the Gaussian coefficients) 


(*) = (1 = @)--(1—9") 
big (lg) (Ll =9*)\(l=g){l—g"*) 
in place of the binomial coefficients. There is also a logarithmic version of the 


umbral calculus, which studies the harmonic logarithms and sequences of 
logarithmic type. For more on these topics, please see [103], [106] and [107]. 


Exercises 

1. Prove that o( fg) = o(f) + o(g), for any f,g E€ F. 

2. Prove that o(f +g) > min{o(f),o(g)}, for any f,g E€ F. 

3. Show that any delta series has a compositional inverse. 

4. Show that for any delta series f, the sequence f * is a pseudobasis. 

5. Prove that ô; is a derivation. 

6. Show that f € F is a delta functional if and only if (f | 1) =0 and 


(F | x) #0. 

7. Show that f € F is invertible if and only if (f | 1) 40. 

8. Show that (f(at) | p(x)) = (f(t) | p(ax)) for any acC, fE F and 
pEP. 

. Show that (te* | p(x)) = p'(a) for any polynomial p(x) € P. 

10. Show that f = g in F if and only if f = g as linear functionals, which 
holds if and only if f = g as linear operators. 

11. Prove that if s„(x) is Sheffer for (g(t), f(t)), then f(t)sn(£) = ns,_1(2). 
Hint: Apply the functionals g(t) f*(t) to both sides. 

12. Verify that the Abel polynomials form the associated sequence for the Abel 
functional. 

13. Show that a sequence s,,(x) is the Appell sequence for g(t) if and only if 
Salt) = 9G) tr". 

14. If f is a delta series, show that the adjoint A; of the umbral operator A; is a 
vector space isomorphism of F. 

15. Prove that if T is an automorphism of the umbral algebra, then T preserves 
order, that is, o(T f (t)) = o( f (t)). In particular, T is continuous. 

16. Show that an umbral operator maps associated sequences to associated 
sequences. 

17. Let p,(a) and g,(a) be associated sequences. Define a linear operator a by 
a: Pn(@) > qn (x). Show that a is an umbral operator. 

18. Prove that if Of and ô, are surjective derivations on F, then 


Ogf (t) = rH. 


